NAME
UI::KeyboardLayout - Module for designing keyboard layouts
SYNOPSIS
#!/usr/bin/perl -wC31
use UI::KeyboardLayout;
use strict;
# Download from http://www.unicode.org/Public/UNIDATA/
UI::KeyboardLayout::->set_NamesList("$ENV{HOME}/Downloads/NamesList.txt");
UI::KeyboardLayout::->set__value('ComposeFiles', # CygWin too
['/usr/share/X11/locale/en_US.UTF-8/Compose']);
UI::KeyboardLayout::->set__value('EntityFiles',
["$ENV{HOME}/Downloads/bycodes.html"]);
UI::KeyboardLayout::->set__value('rfc1345Files',
["$ENV{HOME}/Downloads/rfc1345.html"]);
my $i = do {local $/; open $in, '<', 'MultiUni.kbdd' or die; <$in>};
# Init from in-memory copy of the configfile
my $k = UI::KeyboardLayout:: -> new_from_configfile($i)
-> fill_win_template( 1, [qw(faces CyrillicPhonetic)] );
print $k;
open my $f, '<', "$ENV{HOME}/Downloads/NamesList.txt" or die;
my $k = UI::KeyboardLayout::->new();
my ($d,$c,$names,$blocks,$extraComb,$uniVersion) = $k->parse_NameList($f);
close $f or die;
$k->print_decompositions($d);
$k->print_compositions ($c);
UI::KeyboardLayout::->set_NamesList("$ENV{HOME}/Downloads/NamesList.txt",
"$ENV{HOME}/Downloads/DerivedAge.txt");
my $l = UI::KeyboardLayout::->new();
$l->print_compositions;
$l->print_decompositions;
UI::KeyboardLayout::->set_NamesList("$ENV{HOME}/Downloads/NamesList-6.1.0d8.txt",
"$ENV{HOME}/Downloads/DerivedAge-6.1.0d13.txt"));
my $l = UI::KeyboardLayout::->new_from_configfile('examples/EurKey++.kbdd');
for my $F (qw(US CyrillicPhonetic)) {
# Open file, select()
print $l->fill_win_template(1,[qw(faces US)]);
$l->print_coverage(q(US));
}
perl -wC31 UI-KeyboardLayout\examples\grep_nameslist.pl "\b(ALPHA|BETA|GAMMA|DELTA|EPSILON|ZETA|ETA|THETA|IOTA|KAPPA|LAMDA|MU|NU|XI|OMICRON|PI|RHO|SIGMA|TAU|UPSILON|PHI|CHI|PSI|OMEGA)\b" ~/Downloads/NamesList.txt >out-greek
AUTHORS
Ilya Zakharevich, ilyaz@cpan.org
DESCRIPTION
In this section, a "keyboard" has a certain "character repertoir" (which characters may be entered using this keyboard), and a mapping associating a character in the repertoir to a keypress or to several (sequential or simultaneous) keypresses. A small enough keyboard may have a pretty arbitrary mapping and remain useful (witness QUERTY vs Dvorak vs Colemac). However, if a keyboard has a sufficiently large repertoir, there must be a strong logic ("orthogonality") in this association - otherwise the most part of the repertoir will not be useful (except for people who have an extraordinary memory - and are ready to invest part of it into the keyboard).
"Character repertoir" needs of different people vary enormously; observing the people around me, I get a very narrow point of view. But it is the best I can do; what I observe is that many of them would use 1000-2000 characters if they had a simple way to enter them; and the needs of different people do not match a lot. So to be helpful to different people, a keyboard should have at least 2000-3000 different characters in the repertoir. (Some ballpark comparisons: MES-3B has about 2800 characters; Adobe Glyph list corresponds to about 3600 Unicode characters.)
To access these characters, how much structure one needs to carry in memory? One can make a (trivial) estimate from below: on Windows, the standard US keyboard allows entering 100 - or 104 - characters (94 ASCII keys, SPACE, ENTER, TAB - moreover, C-ENTER, BACKSPACE and C-BACKSPACE also produce characters; so do C-[, C-] and C-\ C-Break in most layouts!). If one needs about 30 times more, one could do with 5 different ways to "mogrify" a character; if these mogrifications are "orthogonal", then there are 2^5 = 32 ways of combining them, and one could access 32*104 = 3328 characters.
Of course, the characters in a "reasonable repertoir" form a very amorphous mass; there is no way to introduce a structure like that which is "natural" (so there is a hope for "ordinary people" to keep it in memory). So the complexity of these mogrification is not in their number, but in their "nature". One may try to decrease this complexity by having very easy to understand mogrifications - but then there is no hope in having 5 of them - or 10, or 15, or 20.
However, we know that many people are able to memorise the layout of 70 symbols on a keyboard. So would they be able to handle, for example, 30 different "natural" mogrifications? And how large a repertoir of characters one would be able to access using these mogrifications?
This module does not answer these questions directly, but it provides tools for investigating them, and tools to construct the actually working keyboard layouts based on these ideas. It consists of the following principal components:
- Unicode table examiner
-
distills relations between different Unicode characters from the Unicode tables, and combines the results with user-specified "manual mogrification" rules. From these automatic/manual mogrifications, it constructs orthogonal scaffolding supporting Unicode characters (we call it composition/decomposition, but it is a major generalization of the corresponding Unicode consortium's terms).
- Layout constructor
-
allows building keyboard layouts based on the above mogrification rules, and on other visual and/or logical directives. It combines the bulk-handling ability of automatic rule-based approach with a flexibility provided by a system of manual overrides. (The rules are read from a .kbdd Keyboard Description file.
- System-specific software layouts
-
may be created basing on the "theoretical layout" made by the layout constructor (currently only on Windows, and only via KBDUTOOL route).
- Report/Debugging framework
-
creates human-readable descriptions of the layout, and/or debugging reports on how the layout creation logic proceeded.
The last (and, probably, the most important) component of the distribution is an example keyboard layout created using this toolset.
Keyboard description files
Syntax
I could not find an appropriate existing configuration file format, so was farced to invent yet-another-config-file-format. Sorry...
Config file is for initialization of a tree implementing a hash of hashes of hashes etc whole leaves are either strings or arrays of strings, and keys are words. The file consists of "sections"; each section fills a certain hash in the tree.
Sections are separated by "section names" which are sequences of word character and /
(possibly empty) enclosed in square brackets. []
is a root hash, then [word]
is a hash reference by key word
in the root hash, then [word/another]
is a hash referenced by element of the hash referenced by [word]
etc. Additionally, a section separator may look like [visual -> wordsAndSlashes]
.
Sections are of two type: normal and visual. A normal section consists of comments (starting with #
) and assignments. An assignment is in one of 4 forms:
word=value
+word=value
@word=value,value,value,value
/word=value/value/value/value
The first assigns a string value
to the key word
in the hash of the current section. The second adds a value to an array referenced by the key word
; the other two add several values. Trailing whitespace is stripped.
Any string value without end-of-line characters and trailing whitespace can be added this way (and values without commas or without slash can be added in bulk to arrays). In particular, there may be no whitespace before =
sign, and the whitespace after =
is a part of the value.
Visual sections consist of comments, assignments, and content
, which is the rest of the section. Comments after the last assignment become parts of the content. The content is preserved as a whole, and assigned to the key unparsed_data
; trailing whitespace is stripped. (This is the way to insert a value containing end-of-line-characters.)
In the context of this distribution, the intent of visual sections is to be parsed by a postprocessor. So the only purpose of explicit assignments in a visual section is to configure how the rest is parsed; after the parsing is done (and the result is copied elsewhere in the tree) these values should better be not used.
Semantic of visual sections
Two types of visual sections are supported: DEADKEYS
and KBD
. A content of DEADKEYS
section is just an embedded (part of) .klc file. We can read deadkey mappings and deadkey names from such sections. The name of the section becomes the name of the mapping functions which may be used inside the Diacritic_*
rule (or in a recipe for a computed layer).
A content of KBD
section consists of #
-comment lines and "the mapping lines"; every "mapping line" encodes one row in a keyboard (in one or several layouts). (But the make up of rows of this keyboard may be purely imaginary; it is normal to have a "keyboard" with one row of numbers 0...9.) Configuration settings specify how many lines are per row, and how many layers are encoded by every line, and what are the names of these layers:
visual_rowcount # how many config lines per row of keyboard
visual_per_row_counts # Array of length visual_rowcount
visual_prefixes # Array of chars; <= visual_rowcount (miss=SPACE)
prefix_repeat # How many times prefix char is repeated (n/a to SPACE)
in_key_separator # If several layers per row, splits a key-descr
layer_names # Where to put the resulting keys array
in_key_separator2 # If one of entries is longer than 1 char, join by this
# (optional)
Each line consists of a prefix (which is ignored except for sanity checking), and whitespace-separated list of key descriptions. (Whitespace followed by a combining character is not separating.) Each key description is split using in_key_separator
into slots, one slot per layout. (The leading in_key_separator
is not separating.) Each key/layout description consists of one or two entries. An entry is either two dashes --
(standing for empty), or a hex number of length >=4, or a string. (A hex numbers must be separated by .
from neighbor word characters.) A loner character which has a different uppercase is auto-replicated in uppercase (more precisely, titlecase) form. Missing or empty key/layout description gives two empty entries (note that the leading key/layout description cannot be empty; same for "the whole key description" - use the leading --
.
If one of the entries in a slot is a string of length ≥ 2, one must separate the entries by in_key_separator2
. Likewise, if a slot has only one entry, and it is longer than 1 char, it must be started or terminated by in_key_separator2
.
To simplify BiDi keyboards, a line may optionally be prefixed with the LRO/RLO
character; if so, it may optionally be ended by spaces and the PDF
character. For compatibility with other components, layer names should not contain characters +()[]
.
Inclusion of .klc files
Instead of including a .klc file (or its part) verbatim in a visual section, one can make a section DEADKEYS/NAME/name1/nm2
with a key klc_filename
. Filename will be included and parsed as a DEADKEYS
visual section (with name DEADKEYS/name1/nm2
???). (Currently only UTF-16 files are supported.)
Metadata
A metadata entry is either a string, or an array. A string behaves as if were an array with the string repeated sufficiently many times. Each personality defines MetaData_Index
which chooses the element of the arrays. The entries
COMPANYNAME LAYOUTNAME COPYR_YEARS LOCALE_NAME LOCALE_ID
DLLNAME SORT_ORDER_ID_ LANGUAGE_NAME
should be defined in the personality section, or above this section in the configuration tree. (Used when output Windows .klc files.)
Optional metadata currently consists only of VERSION
key (the protocol version; hardwired now as 1.0
).
Layer/Face/Prefix-key Recipes
The sections layer_recipes
and face_recipes
contain instructions how to build Layers and Faces out of simpler elements. Similar recipes appear as values of DeadKey_*
entries in a face. Such a "recipe" is executed with parameters: a base face name, a layer number, and a prefix character (the latter is undefined when the recipe is a layer recipe or face recipe). (The recipe is free to ignore the parameters; for example, most recipes ignore the prefix character even when they are "prefix key" recipes.)
The recipes and the visual sections are the most important components of the description of a keyboard group.
To construct layers of a face, a face recipe is executed several times with different "layer number" parameter. In contrast, in simplest cases a layer recipe is executed once. However, when the layer is a part of a compound ("parent") recipe, it inherits the "parameters" from the parent. In particular, it may be executed several times with different face name (if used in different faces), or with different layer number (if used - explicitly or explicitly - in different layer slots; for example, Mutator(LayerName)
in a face/prefix-key recipe will execute the LayerName
recipe separately for all the layer numbers; or one can use Layers(Empty+LayerName)
together with Layers(LayerName+Other)
). Depending on the recipe, these calls may result in the same layout of the resulting layers, or in different layouts.
A recipe may be of three kinds: it is either a "first comer wins" which is a space-separated collection of simpler recipes, or SELECTOR(COMPONENTS)
, or a "mutator": MUTATOR(BASE)
or just MUTATOR
. All recipes must be ()
-balanced and []
-balanced; so must be the MUTATOR
; in turn, the BASE
is either a layer name, or another recipe. A layer name must be defined either in a visual KBD
section, or be a key in the layer_recipes
section (so it should not have +()[]
characters), or be the literal Empty
. When MUTATOR(BASE)
is processed, first, the resulting layer(s) of the BASE
recipe are calculated; then the layer(s) are processed by the MUTATOR
(one key at a time).
The most important SELECTOR
keywords are Face
(with argument a face name, defined either via a faces/FACENAME
section, or via face_recipes
) and Layers
(with argument of the form LAYER_NAME+LAYER_NAME+...
, with layer names defined as above). Both select the layer (out of a face, or out of a list) with number equal to the "layer number parameter" in the context of the recipe. The FlipLayers
builder is similar to Face
, but chooses the "other" layer ("cyclically the next" layer if more than 2 are present).
The other selectors are Self
, LinkFace
and FlipLayersLinkFace
; they operate on the base face or face associated to the base face.
The simplest forms of MUTATORS
are Id, lc, uc, ucfirst, Empty
(note that uc
/lc
/ucfirst
return undefined
when case-conversion results in no change; use maybe_uc
/maybe_lc
/maybe_ucfirst
if one wants them to behave as Perl operators). Recall that a layer is nothing more than a structure associating a pair "unshifted/shifted character" to the key number, and that these characters may be undefined. These simplest mutators modify these characters independently of their key numbers and shift state (with Empty
making all of them undefined). Similar user-defined simple mutators are ByPairs[PAIRS]
; here PAIRS
consists of pairs "FROM TO" of characters (with optional spaces between pairs); characters not appearing as FROM become undefined by ByPairs
. (As usual, characters may be replaced by hex numbers with 4 or more hex digits; separate the number from a neighboring word character by .
[dot].)
All mutators must have a form WORD
or WORD[PARAMETERS]
, with PARAMETERS
(),[]
-balanced. Other simple mutators are dectrl
(converts control-char [those between 0x00 and 0x1f] to the corresponding [uppercase] character), ShiftFromTo[FROM,TO]
(adds a constant to the [numerical code of the] input character so that FROM
becomes TO
), SelectRX[PERL_REGEXP]
(keeps input characters which match, converts everything else to undefined
), FromTo[LAYER_FROM,LAYER_TO]
(similar to ByPairs
, but pairs all characters in the layers based on their position), DefinedTo[CHAR]
(all defined characters are converted to CHAR
).
The mutator Imported[NAME]
is similar to <ByPairs>, but takes the .klc-style visual DEADKEYS/NAME
section as the description of the mutation. NAME
may be followed by a character as in NAME,CHAR
; if not, CHAR
is the prefix key from the recipe's execution parameters.
The simple mutator ByPairs
has flavors: one can append Prefix
or InvPrefix
to the name, and the resulting characters become prefix keys (the “AltGr
-inverted” prefix followed by CHAR
behaves as non-inverted prefix followed by AltGr-CHAR
).
Some mutators pay attention not only to what the character is, but how it is accessible on the given key: such are FlipShift
, FlipLayers
, FromToFlipShift[LAYER_FROM,LAYER_TO]
. Some other mutators also take into account how the key is positioned with respect to the other keys.
ByColumns[CHARS]
assigns a character to a particular column of the keyboard. Which keys are in which columns is governed by how the corresponding visual layer is formatted (shifted to the right by keyline_offsets
array of the visual layer). This visual layer is one associated to the face by the geometry_via_layer
key (and the face is the parameter face of the mutator). CHARS
is a comma-separated list; empty positions map to the undefined character.
ByRows[MUTATORS]
chooses a mutator based on the row of the keyboard. On the top row, it is the first mutator which is chosen, etc. The list MUTATORS
is separated by ///
surrounded by whitespace.
The mutator InheritPrefixKeys[FACE_FROM]
converts some non-prefix characters to prefix characters; the conversion happens if the argument of the mutator coincides with what is at the corresponding position in FACE_FROM
, and this position contains a prefix character. (Nowadays this mutator is not very handy — most of its uses may be accomplished by having inheritable prefix characters in appropriate faces.)
The mutators NotId(BASEFACE FACES)
, NotSameKey(BASEFACE FACES)
process their argument in a special way: the characters in FACES
which duplicated the characters present (on the same key, and possibly with the same modifiers) in BASEFACE
are ignored. The remaining characters are combined “as usual” with “the first comer wins”.
The most important mutator is Mutate
(and its flavors). (See "The Mutate[RULES]
mutator".)
Note that Id(LAYERNAME)
is similar to a selector; it is the only way to insert a layer without a selector, since a bareword is interpreted as a MUTATOR
; Id(LAYERNAME)
is a synonym of Layers(LAYERNAME+LAYERNAME+...)
(repeated as many times as there are layers in the parameter "base face").
The recipes in a space-separated list of recipes ("first comer wins") are interpreted independently to give a collection of layers to combine; then, for every key numbers and both shift states, one takes the leftmost recipe which produces a defined character for this position, and the result is put into the resulting layer.
Keep in mind that to understand what a recipe does, one should trace its description right-to-left order: for example, ByPairs[.:](FlipLayers)
creates a layout where :
is at position of .
, but on the second [=other] layer (essentially, if the base layout is the standard one, it binds the character :
to the keypress AltGr-.
).
To simplify formatting of .kbdd files, a recipe may be an array reference. The string may be split on spaces, or split after comma or |
.
The Mutate[RULES]
mutator
The essense of Mutate
is to have several mutation rules and choose the best of the results of application of these rules. Grouping the rules allows one a flexible way to control what the best actually means. The rules may be separated by comma, by |
, or by |||
(interchangeable with ||||
).
In the simplest case of grouping, RULES
form a |
-separated list, and each group consists of one rule. Then the best result is one coming from an earlier rule. The groups are separated by |
, and the rules inside the group are separated by comma; if more than one rule appears in a group, a different kind of competition appears (inside the group).
The quality of the generated characters is a list UNICODE_AGE, HONEST, UNICODE_BLOCK, IN_CASE_PAIR, FROM_NON_ALTGR_POSITION
with lexicographical order (the earlier element is stronger that ones after it). Here HONEST
describes whether a character is generated by Unicode compositing (versus “compatibility compositing” or other “artificially generated” mogrifiers); the older age wins, as well as honest compositing, earlier Unicode blocks, as well as case pairs and characters from non-AltGr
-positions. (Experience shows that these rules have a pretty good correlation with being “more suitable for human consumption”.)
Moreover, quality in case-pairs is equalized by assigning the strongest quality of two. Such pairs are always considered “tied together” when they compete with other characters. (In particular, if a single character with higher quality occupies one of Shifted/Unshifted
positions, a case pair with lower quality is completely ignored; so the “other” position may be taken by a single character with yet lower quality.)
In addition, the characters which lost the competition for non-AltGr
-positions are considered again on AltGr
-positions. (With boosted priority compared to mutated AltGr
-characters; see above.)
This mutator comes in several flavors: one can append to its name SpaceOK
/Hack
/DupsOK
/32OK
(in this order). Unless SpaceOK
is specified, it will not modify characters on a key which produces SPACE
when used without modifiers. Unless 32OK
is specified, it will not produce Unicode characters after 0xFFFF
(the default is to follow the brain-damaged semantic of prefix keys on Windows). Unless DupsOK
is specified, the result is optimized by removing duplicates (per key) generated by application of RULES
. With the Hack
modifier, the generated characters are not counted as “obtained by logical rules” when statistics for the generated keyboard layout are calculated.
Linked prefixes
On top of what is explained above, there is a way to arrange “linking” of two prefix keys; this linking allows characters which cannot be fit on one (prefixed) key to “migrate” to unassigned positions on the otherwise-prefixed key. (This is similar to migration from non-AltGr
-position to AltGr
-position.) This is achieved by using mutator rules of the following form:
primary = +PRE-GROUPS1|||SHARED||||POST-GROUPS1
secondary = PRE-GROUPS2||||PRE-GROUPS1|||SHARED||||POST-GROUPS2
Groups with digits are not shared (specific to a particular prefix); SHARED
is (effectively) reverted when accessed from the secondary prefix; for the secondary key, the recipies from SHARED
which were used in the primary key are removed from SHARED
, and are appended to the end of POST-GROUPS2
; the PRE-GROUPS1
are skipped when finding assignments for the secondary prefix.
In the primary recipe, |||
and ||||
are interchangeable with |
. Moreover, if POST-GROUPS2
is empty, the secondary recipe should be written as
secondary = PRE-GROUPS2|||PRE-GROUPS1|||SHARED
if PRE-GROUPS1
is empty, this should be written as one of
secondary = PRE-GROUPS2|||SHARED
secondary = PRE-GROUPS2||||SHARED
secondary = PRE-GROUPS2||||SHARED||||POST-GROUPS2
These rules are to allow macro-ization of the common parts of the primary and secondary recipe. Put the common parts as a value of the key Named_DIA_Recipe__***
(here ***
denotes a word), and replace them by the macro <NAMED-***>
in the recipes.
Implementation: the primary key recipe starts with the +
character; it forces interpretation of |||
and ||||
as of ordinary |
.
If not primary, the top-level groups are formed by ||||
(if present), otherwise by |||
. The number of top-level groups should be at most 3. The second of ||||
-groups may have at most 2 |||
-groups; there should be no other subdivision. This way, there may be up to 4 groups with different roles.
The second of 3 toplevel |||
-groups, or the first of two sublevel |||
-groups is the “skip” group. The last of two or three toplevel |||
-groups (or of sublevel |||
-groups, or the 2nd toplevel ||||
-group without subdivisions) is the inverted group; the 3rd of toplevel ||||
-groups is the “extra” group.
“Penalize/prohibit” lists start anew in every top-level group.
Atomic mutators rules
As explained above, the individual RULES in Mutate[RULES]
may be separated by ,
or |
, or |||
or ||||
. Such an individual rule is a combination of atomic rules combined by +
operators, and/or preceded by -
prefix (with understanding that +-
must be replaced by --
). The prefix -
means inversion of the rule; the operator +
is the composition of the rules.
Example: the atomic rule <super>
converts its input character into its superscript forms (if such forms exist; for example, a
may be converted to ᵃ
or ª
). The atomic rules lc
, uc
, ucfirst
behave the same as the corresponding MUTATORs. The atomic rule dectrl
converts a control-character to the corresponding “uppercase” character: ^A
is converted to A
, and ^\
is converted to \
. (The last 4 rules cannot be inverted by -
.)
The composition is performed (as usual) from right to left. Example: the indivial rule <super>+lc+dectrl
converts ^A
to ᵃ
or ª
.
In addition to rules listed above, the atomic rules may be of the following types:
A hex number with ≥4 digits, or a character: implements the composition inverting (compatibility or not) Unicode decompositions into two characters; the character in the rule must the first character of the decomposition. Here “Unicode decompositions” are either deduced from Unicode decomposition rules (with compatibility decompositions having lower priority), or deduced basing on splitting the name of the character into parts.
<pseudo-upgrade>
is an inversion of a Unicode decomposition which goes from 1 character to 1 character.Flavors of characters
<FLAVOR>
from Unicode tables come from Unicode 1-character to 1-character decompositions marked with<FLAVOR>
. Example:<sub>
for a subscript form; or<final>
.<font=***>
rules TBC ..........................................Calculated rules
<pseudo-calculated-***>
are extracted by a heuristic algorithm which tries to parse the Unicode name of the character.For the best understanding of what these rules produce, inspect results of print_compositions(), print_decompositions() methods documented in "SYNOPSIS". The following “keywords” are processed by the algorithm:
WITH, OVER, ABOVE, PRECEDED BY, BELOW (only with LONG DASH)
are separators;
COMBINING CYRILLIC LETTER, BARRED, SLANTED, APPROXIMATELY, ASYMPTOTICALLY, SMALL (not near LETTER), ALMOST, SQUARED, BIG, N-ARY, LARGE, LUNATE, SIDEWAYS DIAERESIZED, SIDEWAYS OPEN, INVERTED, ARCHAIC, EPIGRAPHIC, SCRIPT, LONG, MATHEMATICAL, AFRICAN, INSULAR, VISIGOTHIC, MIDDLE-WELSH, BROKEN, TURNED, INSULAR, SANS-SERIF, REVERSED, OPEN, CLOSED, DOTLESS, TAILLESS, FINAL BAR, SYMBOL, OPERATOR, SIGN, ROTUNDA, LONGA, IN TRIANGLE, SMALL CAPITAL (as smallcaps)
are modifiers. For an
APL FUNCTIONAL SYMBOL
, one scans forQUAD, UNDERBAR, TILDE, DIAERESIS, VANE, STILE, JOT, OVERBAR, BAR
TBC ..........................................
Additionally,
esh/eng/ezh
are consideredpseudo-phonetized
variants of their middle letter, as well asSCHWA
of0
.<pseudo-fake-***>
rules are obtained by scanning the name forWHITE, BLACK, CIRCLED, BUT NOT
as well as for
UM
(asumify
), paleo-Latin digraphs andCON/VEND
(aspaleocontraction-by-last
), doubled-letters (asdoubleletter
),MIDDLE-WELSH
doubled-letters (asdoubleletter-middle-welsh
),MODIFIER LETTER
(possibly withRAISED
orLOW
; assub/super
).Manual prearranged rules TBC ..........................................
<subst-***>
Explicit named substitution rules TBC ..........................................<reveal-substkeys>
Prohibits handling non-substituted input TBC ..........................................<any-***>
rules TBC ..........................................
Input substitution in atomic rules
TBC ..........................................
The Mutate2Self
mutator
TBC ..............................
Pseudo-mutators for generation of documentation
A few mutators do not introduce any characters (in other words, they behave as Empty
) but are used for their side effects: in prefix-key recipes, PrefixDocs[STRING]
introduces documentation of what the prefix key is intended for. Likewise, HTML_classes[HOW]
allows adding CSS classes to highlight parts of HTML output generated by this module, the parts corresponding to selected characters in a face.
HOW
is a comma-separated list, every triple in the list being WHERE,HTML_CLASS,CHARACTERS
. WHERE
is one of k
/K
(which add formatting to the key containing one of the CHARACTERS
) or c
/C
(which add formatting to an individual character displayed on the key), one can add a digit to WHERE
to limit to a particular layer in the face (useful when a character appears several times in a face). The lower-case variants select characters basing on the base face of a key. One can also append =CONTEXT
to WHERE
, then the class is added only if CONTEXT
appears as one of the options for the HTML output generator.
The CSS rules generated by this module support several classes directly; the rest should be supported by the user-supplied rules. The classes with existing support are: on keys
to_w from_w # generate arrows between keys
from_nw from_ne to_nw to_ne # generate arrows between keys; will yellow-outline
pure # unless combined with this
red-bg green-bg blue-bg # tint the key as the whole (as background)
On characters
very-special need-learn may-guess # provide green/brown/yellow-outlines
special # provide blue outline (thick unless combined with
thinspecial # <-- this)
Extra CSS classes for documentation
In additional, several CSS classes are auto-generated basing on Unicode properties of the character. TBC ........................
Debugging mutators
If the bit 0x40 of the environment variable UI_KEYBOARDLAYOUT_DEBUG
(decimal or 0xHEX
) is set, debugging output for mutators is enabled:
r ║ ║ ┆ ║ ṙ ṛ ┆ ║ ║ ║ ║ ⓡ ┆
║ ║ ┆ ║ Ṙ Ṛ ┆ ║ ║ ║ ║ Ⓡ ┆
║ ║ ặ ┆ ║ ┆ ║ ║ ║ ║ ┆
║ ║ Ặ ┆ ║ ┆ ║ ║ ║ ║ ┆
Extracted [ …list… ] deadKey=00b0
The output contains a line per character assigned to the keyboard key (if there are 2 layers, each with lc/uc variants, there are 4 lines); empty lines are omitted. The first column indicates the base character (lc of the 1st layer) of the key; the separator ║
indicates |
-groups in the mutator. Above, the first group produces no mutations, the second group mutates only the characters in the second layer, and the third group produces two mutations per a character in the first layer. The 7th group is also producing mogrifications on the 1st layer.
The next example clarifies ┆
-separator: to the left of it are mogrifications which come in case pairs, to the right are mogrifications where mogrified-lc is not a case pair of mogrified-uc:
t ║ ║ ᵵ ║ ꞇ ┆ ʇ ║ ┆ ║
║ ║ ║ Ꞇ ┆ ᴛ ║ ┆ ║
║ ║ ║ ┆ ║ ꝧ ┆ ║
║ ║ ║ ┆ ║ Ꝧ ┆ ║
Extracted [ …list… ] deadKey=02dc
In this one, │
separates mogrifications with different priorities (based on Unicode ages, whether the atomic mutator was compatibility/synthetic one, and the Unicode block).
/ ║ ║ ║ ║ ║ │ ∴ ║ ║
║ ║ ║ ║ ║ │ ≘ ≗ ║ ║
║ ║ ║ ║ ║ / │ ⊘ ║ ║
Extracted [ …list… ] deadKey=00b0
For secondary mogrifiers, where the distinction between |||
and |
matters, some of the ║
-separators are replaced by ┃
. Additionally, there are two rounds of extraction: first the characters corresponding to the primary mogrifier are TMP-extracted (from the groups PRE-GROUPS1, COMMON); then what is the extracted from COMMON is put back at the effective end (at the end of POST-GROUPS2, or, if no such, at the beginning of COMMON):
t ║ ║ ᵵ ┃ ┃ ʇ │ │ ꞇ ┆ ║
║ ║ ┃ ┃ │ ᴛ │ Ꞇ ┆ ║
║ ║ ┃ ┃ │ │ ꝧ ┆ ║
║ ║ ┃ ┃ │ │ Ꝧ ┆ ║
TMP Extracted: <…list…> from layers 0 0 | 0 0
t ║ ║ ᵵ ┃ ꞇ ┆ ʇ ┋ ┃ ┆ │ ┆ │ ┆ ║
║ ║ ┃ Ꞇ ┆ ᴛ ┋ ┃ ┆ │ ┆ │ ┆ ║
║ ║ ┃ ┆ ┋ ┃ ┆ │ ┆ │ ꝧ ┆ ║
║ ║ ┃ ┆ ┋ ┃ ┆ │ ┆ │ Ꝧ ┆ ║
Extracted [ …list… ] deadKey=02dc
In the second part of the debugging output, the part of common which is put back is separated by ┋
.
When bit 0x80 is set, much more lower-level debugging info is printed. The arrays at separate depth mean: group number, priority, not-cased-pair, layer number, subgroup, is-uc. When bit 0x100 is set, the debugging output for combining atomic mutators is enabled.
Personalities
A personality NAME
is defined in the section faces/NAME
. (NAME
may include slashes - untested???)
An array layers
gives the list of layers forming the face. (As of version 0.03, only 2 layers are supported.) The string LinkFace
is a face.........
Substitutions
In section Substitutions
one defines composition rules which may be used on par with composition rules extracted from Unicode Character Database. An array FOO
is converted to a hash accessible as <subst-FOO>
from a Diacritic
filter of satellite face processor. An element of the the array must consist of two characters (the first is mapped to the second one). If both characters have upper-case variants, the translation between these variants is also included.
Classification of diacritics
The section Diacritics
contains arrays each describing a class of diacritic marks. Each array may contain up to 7 elements, each consising of diacritic marks in the order of similarity to the "principal" mark of the array. Combining characters may be preceded by horizontal space. Seven elements should contain:
Surrogate chars; 8bit chars; Modifiers
Modifiers below (or above if the base char is below)
Vertical (or Comma-like or Doubled or Dotlike or Rotated or letter-like) Modifiers
Prime-like or Centered modifiers
Combining
Combining below (or above if base char is below)
Vertical combining and dotlike Combining
These lists determine what a Diacritic2Self
filter of satellite face processor will produce when followed by whitespace characters (possibly with modifiers) SPACE ENTER TAB BACKSPACE
. (So, if .kbdd file uses Diacritic2Self
) this determines what diacritic prefix keys produce.
Compose Key
The scalar configuration variable ComposeKey
controls the ID of the prefix key to access .Compose composition rules. The rules are read from files in the class/object variable; set this variable with
$self->set__value('ComposeFiles', [@Files]); # Class name (instead of $self) is OK here
The format of the files is the same as for X11’s .Compose (but includes
are not supported); only compositions starting with <Multi_Key>
, having no deadkeys, and (on Windows) expanding to 1 UTF-16 codepoint are processed. (See “systematic” parts of rules in the standard .XCompose — see lines with postfix s
.)
Repeating this prefix twice accesses characters via their HTML/MathML entity names. The files are as above (the variable name is EntityFiles
); the format is the same as in bycodes.html.
Repeating this prefix 3 times accesses characters via their rfc1345
codes; the variable rfc1345Files
contains files in the format of rfc1345.html. It is recommended to download these files (or the later flavors)
http://www.x.org/releases/X11R7.6/doc/libX11/Compose/en_US.UTF-8.html
http://www.w3.org/TR/xml-entity-names/bycodes.html
http://tools.ietf.org/html/rfc1345
See "SYNOPSIS" for an example. Note that this mechanism does not assign this prefix key to any particular position on the keyboard layout; this should be done elsewhere. Implementation detail: if some of these 3 maps cannot be created, they are skipped (so less than 3 chained maps are created).
For more control, one can make this configuration variable into an array. The value KEY
is equivalent to the array with elements
KEY,,ComposeFiles,dotcompose,warn
,KEY,EntityFiles,entity,warn
,KEY,rfc1345Files,rfc1345,warn
Five comma-separated elements are: the global access prefix, the prefix for access from the previous element (chained access), the variable controlling the filelist, the type of files in the filelist, whether to warn when a particular flavor of composition table could not be loaded.
Names of prefix keys
Section DEADKEYS
defines naming of prefix keys. If not named there (or in processed .klc files), the PrefixDocs
property will be used; if none, Unicode name of the character will be used.
More than 2 layers and/or exotic modifier keys
This is controlled by output_layers
, mods_keys_KBD
, and layers_mods_keys
configuration arrays. TBC..................................
CAVEATS for German/French/BÉPO/Neo keyboards
Non-US keycaps: the character "a" is on (VK_)A
, but its scancode is now different. E.g., French's A is on 0x10, which is US's Q. Our table of scancodes is currently hardwired. Some pictures and tables are available on
http://bepo.fr/wiki/Pilote_Windows
With this module, the scancode and the VK_
-code for a position in a layout are calculated via the BaseLayer
configuration variable; the first recognized character at the given position of this layer is translated to the VK_
-code (using a hardwired table). The mapping of VK_
-codes to scancodes is currently hardwired.
For “unusual” keys, one can use the VK
subsection of the face to describe its scancode (the first entry in the array) and the bindings. If the scancode is empty, the name of the key is translated to a scancode using the hardwired tables.
Keyboards: on ease of access (What makes an easy-to-use keyboard layout)
The content of this section has no direct relationship to the functionality of this module. However, we feel that it is better that the user of this module understands these concerns. Moreover, it is these concerns which lead to the principles underlying the functionality of this module.
On the needs of keyboard layout users
Let's start with trivialities: different people have different needs with respect to keyboard layouts. For a moment, ignore the question of the repertoir of characters available via keyboard; then the most crucial distinction corresponds to a certain scale. In absense of a better word, we use a provisional name "the required typing speed".
One example of people on the "quick" (or "rabid"?) pole of this scale are people who type a lot of text which is either "already prepared", or for which the "quality of prose" is not crucial. Quite often, these people may type in access of 100 words per minute. For them, the most important questions are of physical exhaustion from typing. The position of most frequent letters relative to the "rest" finger position, whether frequently typed together letters are on different hands (or at least not on the same/adjacent fingers), the distance fingers must travel when typing common words, how many keypresses are needed to reach a letter/symbol which is not "on the face fo the keyboard" - their primary concerns are of this kind.
On the other, "deliberate", pole these concerns cease to be crucial. On this pole are people who type while they "create" the text, and what takes most of their focus is this "creation" process. They may "polish their prose", or the text they write may be overburdened by special symbols - anyway, what they concentrate on is not the typing itself.
For them, the details of the keyboard layout are important mostly in the relation to how much they distract the writer from the other things the writer is focused on. The primary question is now not "how easy it is to type this", but "how easy it is to recall how to type this". The focus transfers from the mechanics of finger movements to the psycho/neuro/science of memory.
These questions are again multifaceted: there are symbols one encounters every minute; after you recall once how to access them, most probably you won't need to recall them again - until you have a long interval when you do not type. The situation is quite different with symbols you need once per week - most probably, each time you will need to call them again and again. If such rarely used symbols/letters are frequenct (since many of them appear), it is important to have an easy way to find how to type them; on the other hand, probably there is very little need for this way to be easily memorizable. And for symbols which you need once per day, one needs both an easy way to find how to type them, and the way to type them should better be easily memorizable.
Now add to this the fact that for different people (so: different usage scenarios) this division into "all the time/every minute/every day/every week" categories is going to be different. And one should not forget important scenario of going to vacation: when you return, you need to "reboot" your typing skills from the dormant state.
On “mixing” several “allied” layouts
On the other hand, note that the questions discussed above are more or less orthogonal: if the logic of recollection requires ω to be related in some way to the W-key, then it does not matter where the W-key is on the keyboard - the same logic is applicable to the QWERTY base layou t, or BÉPO one, or Colemak, or Dvorak. This module concerns itself only with the questions of "consistency" and the related question of "the ease of recall"; we care only about which symbols relate to which "base keys", and do not care about where the base key sit on the physical keyboard.
EXCEPTIONS: The “main island” of the keyboard contains a 4×10 rectangle of keys. So if a certain collection of special keys may be easily memorized as a rectangular table, it is nice to be able to map this table to the physical keyboard layout. This module contains tool making this task easy.
Now consider the question of the character repertoir: a person may need ways to type "continuously" in several languages; quite often one must must type a “standalone” foreign word in a sentence; in addition to this, there may be a need to occasionally type "standalone" characters or symbols outside the repertoir of these languages. Moreover, these languages may use different scripts (such as Polish/Bulgarian/Greek/Arabic/Japanese), or may share a "bulk" of their characters, and differ only in some "exceptional letters". To add insult to injury, these "exceptional letters" may be rare in the language (such as ÿ in French or à in Swedish) or may have a significant letter frequency (such as é in French) or be somewhere in between (such as ñ in Spanish).
And the non-language symbols do not need to be the math symbols (although often they are). An Engish-language discussion of etimology at the coffee table may lead to a need to write down a word in polytonic greek, or old norse; next moment one would need to write a phonetic transcription in IPA/APA symbols. A discussion of keyboard layout may involve writing down symbols for non-character keys of the keyboard. A typography freak would optimize a document by fine-tuned whitespaces. Almost everybody needs arrows symbols, and many people would use box drawing characters if they had a simple access to them.
Essentially, this means that as far as it does not impacts other accessibility goals, it makes sense to have unified memorizable access to as many symbols/characters as possible. (An example of impacting other aspects: MicroSoft's (and IBM's) "US International" keyboards steal characters `~'^"
: typing them produces "unexpected results" - they are deadkeys. This significantly simplifies entering characters with accents, but makes it harder to enter non-accented characters.)
The simplest rules of design of “large” keyboard layouts
One of the most known principles of design of human-machine interaction is that "simple common tasks should be simple to perform, and complicated tasks should be possible to perform". I strongly disagree with this principle - IMO, it lacks a very important component: "a gradual increase in complexity". When a certain way of doing things is easy to perform, and another similar way is still "possible to perform", but on a very elevated level of complexity, this leads to a significant psychological barrier erected between these two ways. Even when switching from the first way to the other one has significant benefits, this barrier leads to self-censorship. Essentially, people will ignore the benefits even if they exceed the penalty of "the elevated level of complexity" mentioned above. And IMO self-censorship is the worst type of censorship. (There is a certain similarity between this situation and that of "self-fulfilled prophesies". "People won't want to do this, so I would not make it simpler to do" - and now people do not want to do this...)
So I would add another clause to the law above: "and moderately complicated tasks should remain moderately hard to perform". What does it tell us in the situation of keyboard layout? One can separate several levels of complexity.
- Basic:
-
There should be some "base keyboards": keyboard layouts used for continuous typing in a certain language or script. Access from one base keyboard to letters of another should be as simple as possible.
- By parts:
-
If a symbol can be thought of as a combination of certain symbols accessible on the base keyboard, one should be able to "compose" the symbol: enter it by typing a certain "composition prefix" key then the combination (as far as the combination is unambiguously associated to one symbol).
The "thoughts" above should be either obvious (as in "combining a and e should give æ") or governed by simple mneumonic rules; the rules should cover as wide a range as possible (as in "Greek/Coptic/Hebrew/Russian letters are combined as G/C/H/R and the corresponding Latin letter; the correspondence is phonetic, or, in presence of conflicts, visual").
- Quick access:
-
As many non-basic letters as possible (of those expected to appear often) should be available via shortcuts. Same should be applicable to starting sequences of composition rules (such as "instead of typing
StartCompose
and'
one can typeAltGr-'
). - Smart access
-
Certain non-basic characters may be accessible by shortcuts which are not based on composition rules. However, these shortcuts should be deducible by using simple mneumonic rules (such as "to get a vowel with `-accent, type
AltGr
-key with the physical keyboard's key sitting below the vowel key"). - Superdeath:
-
If everything else fails, the user should be able to enter a character by its Unicode number (preferably in the most frequently referenced format: hexadecimal).
NOTE: This does not seem to be easily achievable, but it looks like a very nifty UI: a certain HotKey is reserved (e.g., AltGr-AppMenu
); when it is tapped, and a character-key is pressed (for example, B) a menu-driven interface pops up where user may navigate to different variants of B, Beta, etc - each of variants with a hotkey to reach NOW, and with instructions how to reach it later from the keyboard without this UI.
Also: if a certain timeout passes after pressing the initial HotKey, an instruction what to do next should appear.
The finer rules of design of “large” keyboard layouts
Here are the finer points elaborating on the levels of complexity discussed above:
It looks reasonable to allow "fuzzy mneumonic rules": the rules which specify several possible variants where to look for the shortcut (up to 3-4 variants). If/when one forgets the keying of the shortcut, but remembers such a rule, a short experiment with these positions allows one to reconstruct the lost memory.
-
The "base keyboards" (those used for continuous typing in a certain language or script) should be identical to some "standard" widely used keyboards. These keyboards should differ from each other in position of keys used by the scripts only; the "punctuation keys" should be in the same position. If a script B has more letters than a script A, then a lot of "punctuation" on the layout A will be replaced by letters in the layout B. This missing punctuation should be made available by pressing a modifier (
AltGr
? compare with MicroSoft's Vietnamese keyboard's top row). -
If more than one base keyboard is used, there must be a quick access: if one needs to enter one letter from layout B when the active layout is A, one should not be forced to switch to B, type the letter, then switch back to A. It should better be available also on a prefixed combination "
Quick_Access_Key letter
". -
One should consider what the
Quick_Access_Key
does when the layouts A and B are identical on a particular key (e.g., punctuation). One can go with the "Occam's razor" approach and make theQuick_Access_Key
prefix into the do-nothing identity map. The alternative is make it access some symbols useful both for script A and script B. It is a judgement call.Note that there is a gray area when layouts A and B are not identical, but a key
K
produces punctuation in layout A, and a letter in layout B. Then when in layout B, this punctuation is available onAltGr-key
, so, in principle,Quick_Access_Key
would duplicate the functionality ofAltGr
. Compare with "there is more than one way to do it" below; remember that OS (or misbehaving applications) may make some keypresses "unavailable". I feel that in these situations, “having duplication” is a significant advantage over “having some extra symbols available”. -
The considerations in two preceding parts are applicable also in the case when there are more “allied” layouts than A and B. Ways to make it possible are numerous: one can have several alternative
Quick_Access_Key
’s, and one can use a repeated prefix keyQuick_Access_Key
. With a large enough collection of layouts, a combination of both approaches may be visualized as a chain of layout…
L_Quick³ L_Quick² L_Quick
BaseR_Quick R_Quick² R_Quick³
…here we have two quick access prefix keys, the left one
L_Quick
, and the right oneR_Quick
. Superscripts² ³ …
mean “pressing the prefix key several times”; the prefix keys move one left/right along the chain of layouts. -
The three preceding parts were concerned with entering one character from an “allied” layout. To address another frequent need, entering one word from an “allied” layout, yet another approach may be needed. The solution may be to use a certain combination of modifier keys. (How to choose useful combinations? See: "A convenient assignment of KBD* bitmaps to modifier keys".)
(Using “exotic” modifier keys may be impossible in some badly coded applications. This should not stop one from implementing this feature: sometimes one has a choice from several applications performing the same task. Moreover, since this feature is a “frill”, there is no pressing need to have it always available.)
-
Paired symbols (such as such as ≤≥, «», ‹›, “”, ‘’ should be put on paired keyboard's keys: <> or [] or ().
-
"Directional symbols" (such as arrows) should be put either on numeric keypad or on a 3×3 subgrid on the letter-part of the keyboard (such as QWE/ASD/ZXC). (Compare with [broken?] implementation in Neo2.)
-
for symbols that are naturally thought of as sitting in a table, one can create intuitive mapping of quite large tables to the keyboard. Split each key in halves by a horizontal line, think of
Shift-key
as sitting in the top half. Then ignoring`~
key and most of punctuation on the right hand side, keyboard becomes an 8×10 grid. Taking into accountAltGr
modifier (either as an extra bit, or as splitting a key by a horizontal line), one can map up to 8×10×2 (or 8×20) table to a keyboard.Example: Think of IPA consonants.
-
Cheatsheets are useful. And there are people who are ready to dedicate a piece of their memory to where on a layout is a particularly useful to them symbol. So even if there is no logical position for a certain symbol, but there is an empty slot on layout, one should not hesitate in using this slot.
However, this will be distractive to people who do not want to dedicate their memory to "special cases". So it makes sense to have three kinds of cheatsheets for layouts: one with special cases ignored (useful for most people), one with all general cases ignored (useful for checks "is this symbol available in some place I do not know about" and for memorization), and one with all the bells and whistles.
(Currently this module allows emitting HTML keyboard layouts with such information indicated by classes in markup. The details may be treated by the CSS rules.)
-
"There is more than one way to do it" is not a defect, it is an asset. If it is a reasonable expectation to find a symbol X on keypress K', and the same holds for keypress K'' and they both do not conflict with other "being intuitive" goals, go with both variants. Same for 3 variants, 4 - now you get my point.
Example: The standard Russian phonetic layout has Ё on the
^
-key; on the other hand, Ё is a variant of Е; so it makes sense to have Ё available onAltGr-Е
as well. Same for Ъ and Ь. -
Dead keys which are "abstract" (as opposed to being related to letters engraved on physical keyboard) should better be put on modified state of "zombie" keys of the keyboard (
SPACE
,TAB
,CAPSLOCK
,MENU_ACCESS
).NOTE: Making
Shift-Space
a prefix key may lead to usability issues for people used to type CAPITALIZED PHRASES by keepingShift
pressed all the time. As a minimum, the symbols accessed viaShift-SPACE key
should be strikingly different from those produced bykey
so that such problems are noted ASAP. Example: on the first sight, producingNO-BREAK SPACE
onShift-Space Shift-Space
orShift-Space Space
looks like a good idea. Do not do this: the visually undistinguishableNO-BREAK SPACE
would lead to significantly hard-to-debug problems if it was unintentional.
Explanation of keyboard layout terms used in the docs
The aim of this module is to make keyboard layout design as simple as possible. It turns out that even very elaborate designs can be made quickly and the process is not very error-prone. It looks like certain venues not tried before are now made possible; at least I'm not aware of other attempts in this direction. One can make layouts which can be "explained" very concisely, while they contain thousand(s) of accessible letters.
Unfortunately, being on unchartered territories, in my explanations I'm forced to use home-grown terms. So be patient with me... The terms are keyboard layout group, keyboard, face and layer. (One may want compare them with what ISO 9995 does: http://en.wikipedia.org/wiki/ISO/IEC_9995…. On the other hand, most parts of ISO 9995 look as remote from being ergonomic [in the sense discussed in these sections] as one may imagine!)
In what follows, the words letter and character are used interchangeably. A key means a physical key on a keyboard tapped (possibly together with one of modifiers Shift
, AltGr
- or, rarely, [right] Control
; more advanced layouts may use “extra” modifiers). The key AltGr
is often marked as such on the keycap, otherwise it is just the "right" Alt
key; at least on Windows, for many simple layouts it can be replaced by Control-Alt
. What is a prefix key? Tapping such a key does not produce any letter, but modifies what the next keypress would do (sometimes it is called a dead key; in ISO 9995
terms, it is probably a latching key. Sometimes, prefix keys may be “chained”; then insertion of a character happens not on the second keypress, but on the third one [or fourth/etc]).
To describe which character (or a prefix) is produced by a keypress one must describe the context: which prefix keys were already tapped, and which modifier keys are currently pressed. It is natural to consider the Shift
modifier specially: let’s remove it from the context; now given a context, a keypress may produce two characters: one with Shift
, one without. A layer describe such a pair of characters (or prefixes) for every key of the keyboard.
So, the plain layer is the part of keyboard layout accessible by using only non-prefix keys (possibly in combination with Shift
). Many keyboard layouts have up to 2 additional layers accessible without prefix keys: the AltGr
-layer and Control
-layer.
On the simplest layouts, such as "US" or "Russian", there is no prefix keys or “extra” modifier keys - but this is only feasible for languages which use very few characters with diacritic marks. However, note that most layouts do not use Control
-layer - sometimes it is claimed that this causes problems with system/application interaction.
A face consists of the layers of the layout accessible with a particular combination of prefix keys. The primary face consists of the plain layer and “additional prefix-less layers” of the layout; it is the part of layout accessible without switching "sticky state" and without using prefix keys. There may be up to 3 layers (Plain, AltGr
, rightControl
) per face on the standard Windows keyboard layouts. A secondary face is a face exposed after pressing a prefix key (or a chain of prefix keys).
A personality is a collection of faces: the primary face, plus one face per a defined prefix-key (or a prefix chain). Finally, a keyboard layout group is a collection of personalities (switchable by sticky keys [like CapsLock
] and/or in other system-specific ways) designed to work smoothly together. For example, in multi-script settings, there may be:
one personality per script (e.g., Latin/Greek/Cyrillic/Arabic);
every personality may have several script-specific additional (“satellite”) faces (one per a particular diacritic for Latin personality, one for regional/historic “flavors” for Cyrillic personality, one per aspiration type for Greek personality, etc);
every personality may also have “liason” faces accessing the base faces of other personalities;
with chained prefixes, it is easy to design intuitive ways to access satellite faces of other personalities; then every personality will also contain the satellite faces of other personalities (on different prefix chains!).
For access to “technical symbols” (currencies/math/IPA etc), the personalities may share a certain collection of faces assigned to the same prefix keys.
Example of keyboard layout groups
Start with a very elaborate example (it is more or less a simplified variant of the izKeys
layout. A keyboard layout group may consist of phonetically matched Latin and Cyrillic personalities, and visually matched Greek and Math personalities. Several prefix-keys may be shared by all 4 of these personalities; in addition, there would be 4 prefix-keys allowing access to primary faces of these 4 personalities from other personalities of the group. Also, there may be specialised prefix keys tuned for particular need of entering Latin script, Cyrillic script, Greek script, and Math.
Suppose that there are 8 specialized-for-Latin prefix-keys (for example, name them
grave/tilde/hat/breve/ring_above/macron/acute/diaeresis
although in practice each one of them may do more than the name suggests). Then the Latin personality will have the following 13 faces:
Primary/Latin-Primary/Cyrillic-Primary/Greek-Primary/Math-Primary
grave/tilde/hat/breve/ring_above/macron/acute/diaeresis
NOTE: Here Latin-Primary is the face one gets when one presses the Access-Latin prefix-key when in Latin mode; it may be convenient to define it to be the same as Primary - or maybe not. For example, if one defines it to be Greek-Primary, then this prefix-key has a convenient semantic of flipping between Latin and Greek modes for the next typed character: when in Latin, Latin-PREFIX-KEY a
would enter α, when in Greek, the same keypresses [now meaning "Latin-PREFIX-KEY α"] would enter "a".
Assume that the only “extra” modifier used by the layout is AltGr
. Then each of these faces would consists of two layers: the plain one, and the AltGr
- one. For example, pressing AltGr
with a key on Greek face could add diaeresis to a vowel, or use a modified ("final" or "symbol") "glyph" for a consonant (as in σ/ς θ/ϑ). Or, on Latin face, AltGr-a
may produce æ. Or, on a Cyrillic personality, AltGr-я (ya) may produce ѣ (yat').
Likewise, the Greek personality may define special prefix-keys to access polytonic greek vowels. “Chaining” these prefix keys after the Greek-Primary
prefix key would make it possible to enter polytonic Greek letters from non-Greek personalities without switching to the Greek personality.
With such a keyboard layout group, to type one Greek word in a Cyrillic text one would switch to the Greek personality, then back to Cyrillic; but when all one need to type now is only one Greek letter, it may be easier to use the "Greek-PREFIX-KEY letter" combination, and save switching back to the Cyrillic personality. (Of course, for this to work the letter should be on the primary face of the Greek personality.)
How to make it possible to easily enter a short Greek word when in Cyrillic mode? If one uses one more “extra” modifier key (say, ApplicationMenu
), one could reserve combinations of modifiers with this key to “use” other personality. Say, ApplicationMenu-b
would enter Greek β, AltGr-ApplicationMenu-b
would enter Cyrillic б, etc.
“Onion rings” approach to keyboard layout groups
Looks too complicated? Try to think about it in a different way: there are many faces in a keyboard layout group; break them into 3 "onion rings":
- CORE faces
-
one can "switch to a such a face" and type continuously using this face without pressing prefix keys. In other words, these faces can be made "active" (in an OS-dependent way).
When one CORE face is active, the letters in another CORE face are still accessible by pressing one particular prefix key before each of these letters. This prefix key does not depend on which core face is currently "active".
- Universally accessible faces
-
one cannot "switch to them", however, letters in these faces are accessible by pressing one particular prefix key before this letter. This prefix key does not depend on which core face is currently "active".
- satellite faces
-
one cannot "switch to them", and letters in these faces are accessible from one particular core face only. One must press a prefix key before every letter in such faces.
(In presence of “chained prefixes”, the description is less direct: these faces are much easier to access from one particular CORE face. From another CORE face, one must preceed this prefix key by the access-that-CORE-face prefix.)
For example, when entering a mix of Latin/Cyrillic scripts and math, it makes sense to make the base-Latin and base-Cyrillic faces into the core; it is convenient when (several) Math faces and a Greek face can be made universally accessible. On the other hand, faces containing diacritized Latin letters and diacritized Cyrillic letters should better be made satellite; this avoids a proliferation of prefix keys which would make typing slower.
Comparing to the terms of the preceding section, the CORE faces correspond to personalities. A personality imports the base face from other personalities; it may also import satellite faces from other personalities.
In a personality, one should make access to satellite faces, the imported CORE faces, and the universally accessible faces as simple as possible. If “other” satellite faces are imported, the access to them may be more cumbersome.
Large Latin layouts: on access to diacritic marks
Every prefix key has a numeric ID. On Windows, there are situations when this numeric ID may be visible to the user. (This module makes every effort to make this happen as rarely as possible. However, this effort blows up the size of the layout DLL, and at some moment one may hit the Windows’ limits for size of the layout DLL. To reduce the size of the DLL, the module makes a triage, and won’t protect the ID from leaking in some rare cases.) When such a leak happens, what the user sees is the character with this codepoint. So it makes sense to choose the ID to be the codepoint of a character “related to what the prefix key ‘does’”.
The logic: if the prefix keys add some diacritic, the ID should be the primary non-ASCII spacing modifier letter related to this diacritic: either Latin-1
’s 8-bit characters with high bit set, or if none with the needed glyph, suitable non-Latin-1 "spacing modifier letters" or "spacing clones of diacritics".
If followed by “special keys”, one should be able to access other related modifier letters and combining characters (see "Classification of diacritics" and the section Diacritics
in the example layout); one possible convenient choice is:
- The second press of the prefix key
-
The principal combining mark;
- SPACE
-
The primary non-ASCII spacing modifier letter;
-
The secondary/ternary/etc modifier letter;
- digits (possibly with
Shift
and/orAltGr
) -
related combining marks (with
Shift
and/orAltGr
, other categories from "Classification of diacritics"). '
or"
(possibly withAltGr
)-
secondary/ternary/etc combining marks (or, if these are on digits, replace by prime-shape modifier chars).
The choice of prefix keys
Some stats on prefix keys: ISO 9995-3
uses 41 prefix keys for diacritics (but 15 are fake, see below!); Apple’s US Extended
uses 24 (not counting prefix №, action=specials on the code for this layout:
"'@2#3%5^67*8AaCcEeGghHjJ KkMmNnQqRrsUuvwWYyZz‘’“ default=terminator
№ʺʹƧƨƐɛƼƽƄƅ⁊ȢȣƏəƆɔƎǝƔɣƕǶƞȠ K’ĸƜɯŊŋƢƣƦʀſƱʊʌƿǷȜȝƷʒʻʼʽ №
); bépo uses 20, while EurKey uses 8, and Apple’s US
uses 5. On the other end of spectrum, there are 10 US keyboard keys with "calculatable" relation to Latin diacritics:
`~^-'",./? --- grave/tilde/hat/macron/acute/diaeresis/cedilla/dot/stroke/hook-above
To this list one may add a "calculatable" key $
as the currency prefix; on the other hand, one should probably remove ?
since AltGr-?
should better be "set in stone" to denote ¿
. If one adds Greek, then the calculatable positions for aspiration are on [ ]
(or on ( )
). Of widely used Latin diacritics, this leaves out ring/hacek/breve/horn/ogonek/comma (and doubled grave/acute); these diacretics should be either “mixed in” with similar "calculatable" diacritics (for example, <AltGr-,> may either create a character with cedilla, or with ogonek — depending on the character), or should be assigned on less intuitive positions.
Extra prefix keys of ISO 9995-3
: breve↓/circumflex↓/comma↑/dot↓/↺breve/long-solidus/low-line/macron↓/short-stroke/vertical-line↑↓. Additionally, the following diacritics produce only 4 precomposed characters: ṲṳḀḁ, so their use as prefix characters is questionable: candrabindu/comma↗↓/diaeresis↓/²breve(↓)/²↺breve/²macron(↓)/²tilde/²vertical-line↑↓/=↓/hook↑/ring↓ (Here ↓ is a shortcut for below
, same with ↑ for above
, and ↗ for above right
; ↺ means inverted
, and ² means double
. Combined arrows expand to multiple diacritics.)
(Keep in mind that this list is just a conjecture; the standard does not distinguish combining characters and prefix keys, so it is not clear which keypresses produce combining characters, and which are prefix keys.)
What follows is partially deprecated
Parts of following subsections is better explained in visual description of the izKeys layout; some other parts duplicate
On principles of intuitive design of Latin keyboard
Using tricks described below, it is easy to create a convenient map of vowels with 3 diacritics `¨´ to the QWERTY keyboad. However, some common (meaning: from Latin-1–10 of ISO 8859) letters from Latin alphabet cannot be composed this way; they are ÆÐÞÇIJØŒß (one may need to add ªº, as well as ¡¿ for non-alphabetical symbols). It is crucial that these letters may be entered by an intuitively clear key of the keyboard. There is an obvious ASCII letter associated to each of these (e.g., T associated to the thorn Þ), and in the best world just pressing this letter with AltGr
-modifier would produce the desired symbol.
Note that ª may be associated to @; then º may be mapped to the nearby 2.
There is only one conflict: both Ø,Œ "want" to be entered as AltGr-O
; this is the ONLY piece of arbitrariness in the design so far. After resolving this conflict, AltGr
-keys !2ASDCTIO? are assigned their meanings, and cannot carry other letters (call them the “stuck in stone keys”).
(Other keys "stuck in stone" are dead keys: it is important to have the glyph etched on these keyboard's keys similar to the task they perform.)
Then there are several non-alphabetical symbols accessible through ISO 8859 encodings. Assigning them AltGr
- access is another important task to perform. Some of these symbols come in pairs, such as ≤≥, «», ‹›, “”, ‘’; it makes sense to assign them to paired keyboard's keys: <> or [] or ().
However, this task is in conflict of interests with yet another (!) task, so let us explain the needs answered by that task first.
One can always enter accented letters using dead keys; but many people desire a quickier way to access them, by just pressing AltGr-key (possibly with shift). The most primitive keyboard designs (such as IBM International or Apple’s US (Extended)
http://www.borgendale.com/uls.htm
http://www.macfreek.nl/memory/Mac_Keyboard_Layout
) omit this step and assign only the NECESSARY letters for AltGr- access. (Others, like MicroSoft International, assign only a very small set.)
This problem breaks into two tasks, choosing a repertoir of letters which will be typable this way, and map them to the keys of the keyboard. For example, EurKey choses to use ´¨`-accented characters AEUIO (except for Ỳ), plus ÅÑ; MicroSoft International does ÄÅÉÚÍÓÖÁÑß
only (and IBM International does none); Bepo does only ÉÈÀÙŸ (but also has the Azeri Ə available - which is not in ISO 8819 - and has Ê on the 105th key "2nd \|
"), Mac US has none (at least if one does not count uc characters without lc counterparts), same for Mac Extended
http://bepo.fr/wiki/Manuel
http://bepo.fr/wiki/Utilisateur:Masaru # old version of .klc
http://www.jlg-utilities.com/download/us_jlg.klc
http://tlt.its.psu.edu/suggestions/international/accents/codemacext.html
or look for "a graphic of the special characters" on
http://web.archive.org/web/20080717203026/http://homepage.mac.com/thgewecke/mlingos9.html
Our solution
First, the answer (the alternative, illustrated description is on the visual maps list):
- Rule 0:
-
non-ASCII letters which are not accented by ` ´ ¨ ˜ ˆ ˇ ° ¯ ⁄ are entered by
AltGr
-keys "obviously associated" to them. Supported: ÆÐÞÇIJŒß. - Rule 0a:
-
Same is applicable to Ê and Ñ.
- Rule 1:
-
Vowels AEYUIO accented by ¨´` are assigned the so called "natural position": 3 “alphabetic” rows of keyboard are allocated to accents (¨ is the top, ´ is the middle, ` is the bottom row of 3 alphabetic-rows on keyboard - so À is on ZXCV-row), and are on the same diagonal as the base letter. For left-hand vowels (A,E) the diagonal is in the direction of \, for right hand voweles (Y,U,I,O) - in the direction of /.
- Rule 1a:
-
If the "natural position" is occupied, the neighbor key in the direction of "the other diagonal" is chosen. (So for A,E it is the /-diagonal, and for right-hand vowels YUIO it is the \-diag.)
- Rule 1b:
-
This neighbor key is below unless the key is on bottom row - then it is above.
Supported by rules "1": all but ÏËỲ.
- Rule 2:
-
Additionally, Å,Ø,Ì are available on keys R,P,V. ª is on @, and º is on the nearby 2.
Clarification:
0. If you remember only Rule 0, you still can enter all Latin-1 letter using Rule 0; all you need to remember that most of the dead keys are at “obvious” positions: for izKeys
it is `';"~^.,-/ for `´¨¨˜ˆ°¸¯ ̸ (¨ is repeated on ;"!) and 6 for ˇ (memorizable as “opposite” of ^ for ˆ).
(What the rule 0 actually says is: "You do not need to memorize me". ;-)
(If you need a diacritic which is only similar to one of the listed diacritics, there is a good chance that the dead key above will do what you need.)
1. If all you remember are rules 1,1a, you can calculate the position of the AltGr-key for AEYUIO accented by `´¨ up to a choice of 3 keys (the "natural key" and its 2 neighbors) - which are quick to try all if you forgot the precise position. If you remember rules 1,1ab, then this choice is down to 2 possible candidates.
Essentially, all you must remember in details is that the "natural positions" form a V-shape — \ on left, / on right, and in case of bad luck you should move in the direction of other diagonal one step. Then a letter is either in its "obvious position", or in one of 3 modifications of the “natural position”.
Note that these rules cover ALL the Latin letters appearing in Latin-1..Latin-10, provided we resolve the Œ/Ø-conflict by putting Œ to the key O (since Ø may be entered using AltGr-
/ O)!
Motivations:
It is important to have a logical way to quickly understand whether a letter is quickly accessible from a keyboard, and on which key. (Or, maybe, to find a small set of keys on which a letter may be present — then, if one forgets, it is possible to quickly un-forget by trying a small number of keys).
In fact, the problem of choosing “the optimal” assignment (by minimizing the rules to remember) has almost unique solution. Understanding this solution (to a problem which is essentially combinatorial optimization) may be a great help in memorizing the rules.
The idea: we assign alphabetical Latin characters only to alphabetical keys on the keyboard; this frees the way to use (paired) symbol keys to enter (paired) Unicode symbols. Now observe the diagonals on the alphabetic part of the keyboard: \-diagonals (like EDC) and /-diagonals (like UHB). Each diagonal contains 3 (or less) alphabetic keys; what we want is to assign ¨-accent to the top one, ´-accent to the middle one, and `-accent to the bottom one.
On the left-hand part of the keyboard, use \-diagonals, on the right-hand part use /-diagonals; now each diagonal contains EXACTLY 3 alphabetic keys. Moreover, the diagonals which contain vowels AEYUIO do not intersect!
If we have not decided to have keys set in stone, this would be all - we would get "completely predictable" access to ´¨`-accented characters AEUIO. For example, Ÿ would be accessible on AltGr-
Y, Ý on AltGr-
G, Ỳ on AltGr-
V. Unfortunately, the diagonals contain keys ASDCIO
set in stone. So we need a way to "move away" from these keys. The rule is very simple: we move one step away in the direction of "other" diagonal (/-diagonal on the left half, and \-diagonal on the right half) one step down (unless we start on keys A, C where "down" is impossible and we move up to W or F).
Examples: Ä is on Q, Á "wants to be" on A (used for Æ
), so it is moved to W
; Ö wants to be on O (already used for Ø or Œ), and is moved away to L; È wants to be on C (occupied by Ç), but is moved away to F.
There is no way to enter Ï using this layout (unless we agree to move it to the "8*" key, which may conflict with convenience of entering typographic quotation marks). Fortunately, this letter is rare (comparing even to Ë which is quite frequent in Dutch). So there is no big deal that it is not available for "handy" input - remember that one can always use deadkeys.
http://en.wikipedia.org/wiki/Letter_frequency#Relative_frequencies_of_letters_in_other_languages
Note that the keys P and R are not engaged by this layout; since P is a neighbor of O, it is natural to use it to resolve the conflict between Ø or Œ (which both want to be set in stone on O). This leaves only the key R unengaged; but what we do not cover are two keys Å and Ñ which are relatively frequent in Latin-derived European languages.
Note that Ì is moderately frequent in Italian, but Ñ is much more frequent in Spanish. Since Ì and Ñ want to be on the same key (which on many keyboards is taken by Ñ), it makes sense to prefer Ñ… Likewise, Ê is much more frequent than Ë; switch them.
This leaves only the key R unassigned, AND a very rare Ỳ on B. In izKeys
, one puts Å and Ì there. This completes the explanation of the rule 2.
On possibilities of merging 2 diacritics on one prefix key
With many diacritics, and the limited mnemonically-viable positions on the keyboard, it makes sense to merge several diacritics on the same prefix key. Possible candidates are cedilla/ogonek/comma-below (on AltGr-,
), dot-above/ring-above/dot-below (on AltGr-.
), caron/breve, circumflex/inverted-breve (on AltGr-^). In some cases, only one of the diacretics would be applicable to a particular character. Otherwise, one must decide which of several choices to prefer. The notes below may be useful when designing such preferences. (This module can take most of such choices automatically due to knowledge of Unicode ages of characters; this age correlates well with expected frequency of use.)
Another trick discussed below is implementing a rare diacritic X by applying the diacretic Y to a character with pre-composed diacritic Z.
U-caron: ǔ, Ǔ which is used to indicate u in the third tone of Chinese language pinyin. But U-breve ŭ/Ŭ is used in Latin encodings. Ǧ/ǧ (G with caron) is used, but only in "exotic" or old languages (has no combined form - while G-breve ğ/Ğ is in Latin encodings. A-breve Ă: A-caron Ǎ is not in Latin-N; apparently, is used only in pinyin, zarma, Hokkien, vietnamese, IPA, transliteration of Old Latin, Bible and Cyrillic's big yus.
In EurKey: only a takes breve, the rest take caron (including G but not U)
Merging ° and dot-accent ˙ in Latin-N: only A and U take °, and they do not take dot-accent. In EurKey: also small w,y take ring accent; same in Bepo - but they do not take dot accent in Latin-N.
Double-´ and cornu (both on a,u only) can be taken by ¨ or ˙ on letters with ¨ precombined (in Unicode ¨ is not precombined with diaeresis or dots). But one must special-case Ë and Ï and Ø (have Ê and IJ instead; IJ takes no accents, but Ê takes acute, grave, tilde and dot below...)! Æ takes acute and macron; Ø takes acute.
Actually, cornu=horn is only on o,u, so using dot/ring on ö and ü is very viable...
So for using AltGr-letter after deadkeys: diaeresis can take dot above, hat and wedge, diaeresis. Likewise, ` and ´ are not precombined together (but there is a combined combining mark). So one can do something else on vowels (ogonek?).
Applying ´ to `-accented forms: we do not have ỳ (on AltGr-keys), so must use "the natural position" which is mixed with Ñ (takes no accents) and Ç (takes acute!!!).
s, t do not precombine with `; so can use for the "alternative cedilla".
Only a/u/w/y take ring, and they do not take cedilla. Can merge.
Bepo's hook above; ảɓƈɗẻểƒɠɦỉƙɱỏƥʠʂɚƭủʋⱳƴỷȥ ẢƁƇƊẺỂƑƓỈƘⱮỎƤƬỦƲⱲƳỶȤ
perl -wlnae "next unless /HOOK/; push @F, shift @F; print qq(@F)" NamesList.txt | sort | less
Of capital letters only T and Y take different kinds of hooks... (And for T both are in Latin-Extended-B...)
Useful tidbits from Unicode mailing list
On keyboards
On MS keyboard (absolutely wrong!)
http://unicode.org/mail-arch/unicode-ml/y2012-m05/0268.html
Symbols for Keyboard keys:
http://unicode.org/mail-arch/unicode-ml/Archives-Old/UML009/0204.html
“Menu key” variations:
http://unicode.org/mail-arch/unicode-ml/Archives-Old/UML009/0239.html
Role of ISO/IEC 9995, switchable keycaps
http://unicode.org/mail-arch/unicode-ml/Archives-Old/UML009/0576.html
On the other hand, having access to text only math symbols makes it possible to implement it in computer languages, making source code easier to read.
Right now, I feel there is a lack of keyboard maps. You can develop them on your own, but that is very time consuming.
http://unicode.org/mail-arch/unicode-ml/y2011-m04/0117.html
Fallback in “smart keyboards” interacting with Text-Service unaware applications
http://unicode.org/mail-arch/unicode-ml/y2014-m03/0165.html
Keyboards - agreement (5 scripts at end)
ftp://ftp.cen.eu/CEN/Sectors/List/ICT/CWAs/CWA-16108-2010-MEEK.pdf
Need for a keyboard, keyman examples; why "standard" keyboards are doomed
http://unicode.org/mail-arch/unicode-ml/y2010-m01/0015.html
http://unicode.org/mail-arch/unicode-ml/y2010-m01/0022.html
http://unicode.org/mail-arch/unicode-ml/y2010-m01/0036.html
http://unicode.org/mail-arch/unicode-ml/y2010-m01/0053.html
History of Unicode
Unicode in 1889
http://www.archive.org/stream/unicodeuniversa00unkngoog#page/n3/mode/2up
Structure of development of Unicode
http://unicode.org/mail-arch/unicode-ml/y2006-m07/0056.html
http://unicode.org/mail-arch/unicode-ml/y2005-m07/0099.html
I don't have a problem with Unicode. It is what it is; it cannot
possibly be all things to all people:
http://unicode.org/mail-arch/unicode-ml/y2005-m07/0101.html
Control characters’ names
http://unicode.org/mail-arch/unicode-ml/y2014-m03/0036.html
Compromizes vs reality
http://unicode.org/mail-arch/unicode-ml/y2010-m02/0106.html
http://unicode.org/mail-arch/unicode-ml/y2010-m02/0117.html
Stability of normalization
http://unicode.org/mail-arch/unicode-ml/y2005-m07/0055.html
Universality vs affordability
http://unicode.org/mail-arch/unicode-ml/y2007-m07/0157.html
Drachma
http://unicode.org/mail-arch/unicode-ml/y2012-m05/0167.html
http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3866.pdf
w-ring is a stowaway
http://unicode.org/mail-arch/unicode-ml/y2012-m04/0043.html
History of squared pH (and about what fits into ideographic square)
http://unicode.org/mail-arch/unicode-ml/y2012-m02/0123.html
http://unicode.org/mail-arch/unicode-ml/y2013-m09/0111.html
Silly quotation marks: 201b, 201f
http://en.wikipedia.org/wiki/Quotation_mark_glyphs
http://unicode.org/mail-arch/unicode-ml/y2006-m06/0300.html
http://unicode.org/mail-arch/unicode-ml/y2006-m06/0317.html
http://en.wikipedia.org/wiki/Comma
http://en.wikipedia.org/wiki/%CA%BBOkina
http://en.wikipedia.org/wiki/Saltillo_%28linguistics%29
http://unicode.org/mail-arch/unicode-ml/y2006-m06/0367.html
http://unicode.org/unicode/reports/tr8/
under "4.6 Apostrophe Semantics Errata"
OHM: In modern usage, for new documents, this character should not be used
http://unicode.org/mail-arch/unicode-ml/y2011-m08/0060.html
Uppercase eszett ß ẞ
http://unicode.org/mail-arch/unicode-ml/y2007-m05/0007.html
http://unicode.org/mail-arch/unicode-ml/y2007-m05/0008.html
http://unicode.org/mail-arch/unicode-ml/y2007-m05/0142.html
http://unicode.org/mail-arch/unicode-ml/y2007-m05/0045.html
http://unicode.org/mail-arch/unicode-ml/y2007-m05/0147.html
http://unicode.org/mail-arch/unicode-ml/y2007-m05/0170.html
http://unicode.org/mail-arch/unicode-ml/y2007-m05/0196.html
Should not use (roman numerals)
http://unicode.org/mail-arch/unicode-ml/y2007-m11/0253.html
Colors in Unicode names
http://unicode.org/mail-arch/unicode-ml/y2011-m03/0100.html
Xerox and interrobang
http://unicode.org/mail-arch/unicode-ml/y2005-m04/0035.html
Tibetian (history of encoding, relative difficulty of handling comparing to cousins)
http://unicode.org/mail-arch/unicode-ml/y2013-m04/0036.html
http://unicode.org/mail-arch/unicode-ml/y2013-m04/0040.html
Translation of 8859 to 10646 for Latvian was MECHANICAL
http://unicode.org/mail-arch/unicode-ml/y2013-m06/0057.html
Hyphens:
http://unicode.org/mail-arch/unicode-ml/y2009-m10/0038.html
NOT and BROKEN BAR
http://unicode.org/mail-arch/unicode-ml/y2007-m12/0207.html
http://www.cs.tut.fi/~jkorpela/latin1/ascii-hist.html#5C
Combining power of generative features - implementor's view
http://unicode.org/mail-arch/unicode-ml/y2004-m09/0145.html
Greek and about
OXIA vs TONOS
http://www.tlg.uci.edu/~opoudjis/unicode/unicode_gkbkgd.html#oxia
Greek letters for non-Greek
http://stephanus.tlg.uci.edu/~opoudjis/unicode/unicode_interloping.html#ipa
Macron and breve in Greek dictionaries
http://www.unicode.org/mail-arch/unicode-ml/y2013-m08/0011.html
LAMBDA vs LAMDA
http://unicode.org/mail-arch/unicode-ml/y2010-m06/0063.html
COMBINING GREEK YPOGEGRAMMENI equilibristic (depends on a vowel?)
http://unicode.org/mail-arch/unicode-ml/y2006-m06/0299.html
http://unicode.org/mail-arch/unicode-ml/y2006-m06/0308.html
http://www.tlg.uci.edu/~opoudjis/unicode/unicode_adscript.html
http://unicode.org/mail-arch/unicode-ml/y2008-m05/0046.html
Latin, Cyrillic, Hebrew, etc
Book Spine reading direction
http://www.artlebedev.com/mandership/122/
What is a "Latin" char
http://unicode.org/forum/viewtopic.php?f=23&t=102
Federal vs regional aspects of Latinization (a lot of flak; cp1251)
http://peoples.org.ru/stenogramma.html
Yiddish digraphs
http://unicode.org/mail-arch/unicode-ml/y2011-m10/0121.html
Cyrillic Script, Unicode status (+combining)
http://scriptsource.org/cms/scripts/page.php?item_id=entry_detail&uid=ngc339csy8
http://scriptsource.org/cms/scripts/page.php?item_id=entry_detail&uid=ktxptbccph
The IBM 1401 Hebrew Letter Key
http://www.qsm.co.il/Hebrew/HebKey.htm
GOST 10859
http://unicode.org/mail-arch/unicode-ml/y2009-m09/0082.html
http://www.mailcom.com/besm6/ACPU-128.jpg
Hebrew char input
http://rishida.net/scripts/pickers/hebrew/
http://rishida.net/scripts/uniview/#title
Cyrillic soup
http://czyborra.com/charsets/cyrillic.html
How to encode Latin-in-fraktur
http://unicode.org/mail-arch/unicode-ml/y2007-m01/0279.html
http://unicode.org/mail-arch/unicode-ml/y2007-m01/0263.html
The presentation of the existing COMBINING CEDILLA which has three major forms [ȘșȚț and Latvian Ģģ]
http://unicode.org/mail-arch/unicode-ml/y2013-m06/0045.html
http://unicode.org/mail-arch/unicode-ml/y2013-m06/0066.html
Math and technical texts
Missing: .... skew-orthogonal complement
Math Almost-Text encoding
http://unicode.org/notes/tn28/UTN28-PlainTextMath-v3.pdf
http://unicode.org/mail-arch/unicode-ml/y2011-m10/0018.html
For me 1/2/3/4 means unambiguously ((1/2)/3)/4, i.e. 1/(2*3*4)
Unicode mostly encodes characters that are in use or have been
encoded in other standards. While not semantically agnostic, it is
much less oriented towards semantic clarifications and
distinctions than many people might hope for (and this includes
me, some of the time at least).
Horizontal/vertical line/arrow extensions
http://unicode.org/charts/PDF/U2300.pdf
http://unicode.org/mail-arch/unicode-ml/y2003-m07/0513.html
http://std.dkuug.dk/JTC1/SC2/WG2/docs/n2508.htm
Pretty-printing text math
http://code.google.com/p/sympy/wiki/PrettyPrinting
Sub/Super on a terminal
http://unicode.org/mail-arch/unicode-ml/y2008-m07/0028.html
CR symbols
http://unicode.org/mail-arch/unicode-ml/y2006-m07/0163.html
Math layout
http://unicode.org/mail-arch/unicode-ml/y2007-m01/0303.html
Attempts of classification
http://std.dkuug.dk/jtc1/sc2/wg2/docs/n4384.pdf
http://std.dkuug.dk/JTC1/SC2/WG2/
Buttons Target Also=not-in-series-of-n4384
square 1🞌 2⬝ 3🞍 4▪ 5◾ 6◼ 7■ s⬛ (solid=s⬛)
box 1□ 2🞎 3🞏 4🞐 5🞑 6🞒 7🞓 o⬜ 1🞔 2▣ 3🞕 🞖 =white square (open=o⬜) also: ▫◽◻⌑⧈⬚⸋⊡
black circle 1⋅ 2∙ 3🞄 4⦁ 5⦁ 6⚫ 7● also: ·
ring 1○ 2⭘ 3🞆 4🞆 5🞇 6🞈 7🞉 1⊙ 2🞊 3⦿ 🞋 =white circle also: ⊚⌾◌⚪⚬⨀◦⦾
black diamond 1🞗 2🞘 3⬩ 4🞙 5⬥ 6◆
white diamond ◇ 1🞚 2◈ 3🞛 🞜 also: ⋄
black lozenge 1🞝 2🞞 3⬪ 4🞟 5⬧ 6⧫
white lozenge ◊ 🞠
centered n-gon 3⯅ 4⯀ 5⬟ 6⬣ 8⯃
cent on-corner 3⯆ 4⯁ 5⯂ 6⬢ 8⯄ (also ⯇ ⯈)
cross 1🞡 2🞢 3🞣 4🞤 5🞥 6🞦 7🞧
saltire 1🞨 2🞩 3🞪 4🞫 5🞬 6🞭 7🞮 ≈ times (rotated cross)
5-asterisk 1🞯 2🞰 3🞱 4🞲 5🞳 6🞴
6-asterisk 1🞵 2🞶 3🞷 4🞸 5🞹 6🞺
8-asterisk 1🞻 2🞼 3🞽 4🞾 5🞿
light star 3🟀 4🟄 5🟉 6✶ 8🟎 12🟒
medium star 3🟁 4🟅 5★ 6🟋 8🟏 12🟓
(heavy) star 3🟂 4🟆 5🟊 6🟌 8🟐 12✹
pinwheel 3🟃 4🟇 5✯ 6🟍 8🟑 12🟔 lighter: ✵
Unicode and linguists
Linguists mailing lists
http://unicode.org/mail-arch/unicode-ml/y2009-m06/0066.html
Obsolete IPA
http://unicode.org/mail-arch/unicode-ml/y2009-m01/0487.html
http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[%3Asubhead%3D%2F%28%3Fi%29archaic%2F%3A]+&g=
Teutonista (vowel guide p11, kbd p13)
http://www.sprachatlas.phil.uni-erlangen.de/materialien/Teuthonista_Handbuch.pdf
Glottals
http://unicode.org/mail-arch/unicode-ml/y2008-m05/0151.html
http://unicode.org/mail-arch/unicode-ml/y2008-m05/0163.html
http://unicode.org/mail-arch/unicode-ml/y2008-m05/0202.html
http://unicode.org/mail-arch/unicode-ml/y2008-m05/0205.html
Spaces, invisible characters, VS
Substitute blank
http://unicode.org/mail-arch/unicode-ml/y2011-m07/0101.html
Representing invisible characters
http://unicode.org/mail-arch/unicode-ml/y2011-m07/0094.html
Ignorable glyphs
http://unicode.org/mail-arch/unicode-ml/y2007-m08/0132.html
http://unicode.org/mail-arch/unicode-ml/y2007-m08/0138.html
http://unicode.org/mail-arch/unicode-ml/y2007-m08/0120.html
HOWTO: (non)dummy VS in fonts
http://unicode.org/mail-arch/unicode-ml/y2007-m08/0118.html
ZWSP ZWNJ WJ SHY NON-BREAKING HYPHEN
http://unicode.org/mail-arch/unicode-ml/y2007-m08/0123.html
http://unicode.org/mail-arch/unicode-ml/y2007-m07/0188.html
http://unicode.org/mail-arch/unicode-ml/y2007-m07/0199.html
http://unicode.org/mail-arch/unicode-ml/y2007-m07/0201.html
http://unicode.org/mail-arch/unicode-ml/y2007-m06/0122.html
http://unicode.org/mail-arch/unicode-ml/y2007-m01/0297.html
On which base to draw a "standalone" diacritics
http://unicode.org/mail-arch/unicode-ml/y2007-m07/0075.html
Variation sequences
http://unicode.org/mail-arch/unicode-ml/y2004-m07/0246.html
Typesetting
Upside-down text in CSS (remove position?)
http://unicode.org/mail-arch/unicode-ml/y2012-m01/0037.html
Unicode to PostScript
http://unicode.org/mail-arch/unicode-ml/y2009-m06/0056.html
http://www.linuxfromscratch.org/blfs/view/svn/pst/enscript.html
http://unicode.org/mail-arch/unicode-ml/y2009-m06/0062.html
Spacing: English and French
http://unicode.org/mail-arch/unicode-ml/y2006-m09/0167.html
http://unicode.org/mail-arch/unicode-ml/y2008-m05/0103.html
http://unicode.org/mail-arch/unicode-ml/y2007-m08/0138.html
Chicago Manual of Style
http://unicode.org/mail-arch/unicode-ml/y2006-m01/0127.html
Coloring parts of ligatures Implemenations:
http://unicode.org/mail-arch/unicode-ml/y2005-m06/0195.html
http://unicode.org/mail-arch/unicode-ml/y2005-m06/0233.html
http://unicode.org/mail-arch/unicode-ml/y2005-m06/0208.html
GPOS
http://unicode.org/mail-arch/unicode-ml/y2005-m06/0167.html
Chinese typesetting
http://idsgn.org/posts/the-end-of-movable-type-in-china/
@fonts and non-URL URIs
http://unicode.org/mail-arch/unicode-ml/y2010-m01/0156.html
Looking at the future
Why and how to introduce innovative characters
http://unicode.org/mail-arch/unicode-ml/y2012-m01/0045.html
Unicode knows the concept of a provisional property
http://unicode.org/mail-arch/unicode-ml/y2011-m11/0142.html
http://unicode.org/reports/tr23/
http://unicode.org/mail-arch/unicode-ml/y2011-m11/0161.html
If you want to make analogies, however, the ISO ballots constitute
the *provisional* publication for character code points and names.
that needs to be available from day one for a character to be
implementable at all (such as decomp mappings, bidi class,
code point, name, etc.).
ZERO-WIDTH UNDEFINED DECOMPOSITION MARK
- to define decomposition, prepend it
Exciting new letter forms for English
http://www.theonion.com/articles/alphabet-updated-with-15-exciting-new-replacement,2869/
Proposing new stuff, finding new stuff proposed
http://unicode.org/mail-arch/unicode-ml/y2008-m01/0238.html
http://www.unicode.org/mail-arch/unicode-ml/y2013-m09/0056.html
A useful set of criteria for encoding symbols is found in Annex H of this document:
http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3002.pdf
Unsorted
Summary views into CLDR
http://www.unicode.org/cldr/charts//by_type/patterns.characters.html
http://www.unicode.org/cldr/charts//by_type/misc.exemplarCharacters.html
Pound
http://unicode.org/mail-arch/unicode-ml/y2012-m05/0242.html
Classification of Dings (bats etc)
std.dkuug.dk/jtc1/sc2/wg2/docs/n4115.pdf
Escape: 2be9 2b9b
ARROW SHAFT - various
Locales
http://blog.kyero.com/2011/11/14/what-is-the-common-locale-data-repository/
http://blog.kyero.com/2010/12/02/lost-in-translation-locales-not-languages/
http://unicode.org/mail-arch/unicode-ml/y2006-m06/0203.html
General
http://ebixio.com/online_docs/UnicodeDemystified.pdf
Diacritics in fonts
http://unicode.org/mail-arch/unicode-ml/y2011-m05/0047.html
http://www.user.uni-hannover.de/nhtcapri/combining-marks.html#greek
Licences (GPL etc) in TV sets
http://unicode.org/mail-arch/unicode-ml/y2009-m12/0092.html
Similar glyphs:
http://unicode.org/reports/tr39/data/confusables.txt
GeoLocation by IP
http://unicode.org/mail-arch/unicode-ml/y2009-m04/0197.html
Per language character repertoir:
http://unicode.org/mail-arch/unicode-ml/y2009-m04/0253.html
http://unicode.org/mail-arch/unicode-ml/y2009-m04/0255.html
Dates/numbers in Unicode
http://unicode.org/mail-arch/unicode-ml/y2010-m02/0122.html
Normalization FAQ
http://www.macchiato.com/unicode/nfc-faq
Apostrophe
http://unicode.org/mail-arch/unicode-ml/y2008-m05/0060.html
http://unicode.org/mail-arch/unicode-ml/y2008-m05/0063.html
http://unicode.org/mail-arch/unicode-ml/y2008-m05/0066.html
http://unicode.org/mail-arch/unicode-ml/y2007-m07/0251.html
http://unicode.org/mail-arch/unicode-ml/y2007-m05/0309.html
Apostroph as soft sign
http://unicode.org/mail-arch/unicode-ml/y2010-m08/0123.html
Questionner at start of Unicode proposal
http://unicode.org/mail-arch/unicode-ml/y2007-m05/0087.html
Rubi
http://en.wikipedia.org/wiki/Ruby_character#Unicode
Tamil/ISCII
http://unicode.org/faq/indic.html
http://unicode.org/versions/Unicode6.1.0/ch09.pdf
http://www.brainsphere.co.in/keyboard/tm.pdf
CGI and OpenType
http://unicode.org/mail-arch/unicode-ml/y2008-m02/0097.html
Numbers in scripts ;-)
http://unicode.org/mail-arch/unicode-ml/y2008-m02/0120.html
Indicating coverage of the font
http://unicode.org/mail-arch/unicode-ml/y2008-m02/0152.html
http://unicode.org/mail-arch/unicode-ml/y2008-m02/0167.html
Accessing ligatures
http://unicode.org/mail-arch/unicode-ml/y2007-m11/0210.html
Folding characters
http://unicode.org/reports/tr30/tr30-4.html
Writing systems vs written languages
http://unicode.org/mail-arch/unicode-ml/y2005-m07/0198.html
http://unicode.org/mail-arch/unicode-ml/y2005-m07/0241.html
MS Visual OpenType tables
http://www.microsoft.com/typography/VOLT.mspx
http://www.microsoft.com/typography
"Same" character Oacute used for different "functions" in the same text
http://unicode.org/mail-arch/unicode-ml/y2004-m08/0019.html
etc:
http://unicode.org/mail-arch/unicode-ml/y2004-m07/0227.html
Diacritics
http://www.sil.org/~gaultney/ProbsOfDiacDesignLowRes.pdf
http://en.wikipedia.org/wiki/Sylfaen_%28typeface%29
http://tiro.com/Articles/sylfaen_article.pdf
Sign writing
http://std.dkuug.dk/jtc1/sc2/wg2/docs/n4342.pdf
Writing digits in non-decimal
http://unicode.org/mail-arch/unicode-ml/y2011-m03/0050.html
Which separator is less ambiguous? Breve ˘ ? ␣ ? Inverted ␣ ?
Use to identify a letter:
http://unicode.org/charts/collation/
Perl has problems with unpaired surrogates (whole thread)
http://unicode.org/mail-arch/unicode-ml/y2010-m11/0034.html
Complex fonts (e.g., Indic)
http://unicode.org/mail-arch/unicode-ml/y2010-m10/0049.html
Complex glyphs in Symbola (pre-6.01) font may crash older versions of Windows
http://unicode.org/mail-arch/unicode-ml/y2010-m10/0082.html
http://unicode.org/mail-arch/unicode-ml/y2010-m10/0084.html
Window 7 SP1 improvements
http://babelstone.blogspot.de/2010/05/prototyping-tangut-imes-or-why-windows.html
Middle dot is ambiguous
http://unicode.org/mail-arch/unicode-ml/y2010-m09/0023.html
http://unicode.org/mail-arch/unicode-ml/y2013-m03/0151.html
Superscript == modifiers
http://unicode.org/mail-arch/unicode-ml/y2010-m03/0133.html
Translation of Unicode names
http://unicode.org/mail-arch/unicode-ml/y2012-m12/0066.html
http://unicode.org/mail-arch/unicode-ml/y2012-m12/0076.html
Transliteration on passports (see p.IV-48), UniDEcode
http://www.icao.int/publications/Documents/9303_p1_v1_cons_en.pdf
http://unicode.org/mail-arch/unicode-ml/y2013-m11/0025.html
Keyboard input on Windows: interaction of applications and the kernel
Keyboard input on Windows, Part I: what is the kernel doing?
This is not documented. We try to provide a description which is both as simple as possible, and as complete as possible. (We ignore many important parts: the handling of hot keys [or C-A-Del
]), IME, handling of focus switch [Alt-Tab
etc], the syncronization of keystate between different queues, waking up the system, the keyboard filters, widening of virtual keycodes, and LED lights.)
We omit Step 0, when the hardware keyboard drivers (PS/2 or USB) deliver keydown/up(/repeat???) event for scan codes of corresponding keys. (This is a complicated topic, but well-documented.)
The scan codes are massaged (see “Low level scancode mapping” in "SEE ALSO").
The keyboard layout tables map the translated scancode to a virtual keycode. (This may also depend on the “modification column”; see "Far Eastern keyboards on Windows".) The “internal” key state table is updated.
Mythology: the modification keys (
Shift
,Alt
,Ctrl
etc) are taken into account.What actually happens: any key may act as a modification key. The keyboard layout tables map keycodes to 8-bit masks. (The customary names for lower bits of the mask are
KBDSHIFT
,KBDCTRL
,KBDALT
,KBDKANA
; two more bits are namedKBDROYA
andKBDLOYA
— after OYAYUBI 親指, meaning THUMB; two more bits are unnamed.) The keycodes of the currently pressed keys (from the “internal” table) are translated to masks, and these masks are ORed together. (For the purpose of translation toWM_CHAR
/etc [done in ToUnicode()/ToUnicodeEx()], the bitKBDKANA
may be set also when keyVK_KANA
was pressed odd number of times; this is controlled byKANALOK
flag in a virtual key descriptor [of the key being currently processed] of the keyboard layout tables.)The keyboard layout tables translate the ORed mask to a number called “modification column”. (Thess two numbers are completely hidden from applications. The only glint the applications get is in the [useless, since there is no way to map it to anything “real”] result of VkKeyScanEx().])
Depending on the current “modification column”, the virtual keycode of the current key event may be massaged further. (See "Far Eastern keyboards on Windows".) Numpad keycodes depend also on the state of
NumLock
— provided the keyboard layout table marks them withKBDNUMPAD
flag. A few other scancodes may also produce different virtual keycodes in different situations (e.g.,Break
).When
KLLF_ALTGR
flag is present, fake presses/releases of leftCtrl
are generated on presses/releases of rightAlt
. With keypad presses/releases in presence ofVK_SHIFT
andNumLock
, fake releases/presses ofVK_SHIFT
are generated.If needed, asyncroneous key state for the current key's non-left-non-right flavor is updated. (The rest is dropped if the key is consumed by a
WH_KEYBOARD_LL
hook.)Asyncroneous key state for the current key is updated. Numpad-by-number flags are updated. (The rest is dropped if the key is a hotkey.)
The message
WM_(SYS)KEYDOWN/UP
is posted to the application. IfVK_MENU
[usually called theAlt
key] is down, butVK_CONTROL
is not, the event is ofSYS
flavor (this info is duplicated in lParam. Additionally, forVK_MENU
tapping, the UP event is also madeSYS
— although at this momentVK_MENU
is not down!). (TheKBDEXT
flag [of the scancode] is also delivered to the application.)(When a
WM_(SYS)KEYDOWN/UP
message is posted, the key state is updated. This key state may be used by TranslateMessage() as an argument to ToUnicode(), and is returned by GetKeyState() etc.)The following steps are applicable only if the application uses “the standard message pump” with TranslateMessage()/DispatchMessage() or uses some equivalent code.
Before the application dispatches
WM_(SYS)KEYDOWN/UP
to the message handler, TranslateMessage() calls ToUnicode() withwFlags = 0
(unless a popup menu is active; thenwFlags = 1
— which disables character-by-number input via numeric KeyPad) and the buffer of 16 UTF-16 code units.The UTF-16 code units obtained from ToUnicode() are posted via PostMessage(). All the code units but the last one are marked by
FAKE_KEYSTROKE
flag inlParam
. If the initial message wasWM_SYSKEYDOWN
, theSYS
flavor is posted; if ToUnicode() returns a deadkey, theDEAD
flavor is posted.(The bit
ALTNUMPAD_BIT
is set/used only for the console handler.)
Keyboard input on Windows, Part II: The semantic of ToUnicode()
The syntax of ToUnicode() is documented, the semantic is not. Here we fix this.
If the bit 0x01 in
wFlags
is not set, the key event is checked for contributing to character-by-number input via numeric KeyPad (and numpad-by-number flags are updated). If so, the character is delivered only whenAlt
is released. (This the only case when KEYUP delivers a character.) Unless the bit 0x02 inwFlags
is set, the KEYUP events are not processed any more.The flag
KLLF_LRM_RLM
is acted upon, andVK_PACKET
is processed.The keys which are currently down are mapped to the ORed bitmap (see above).
If the key event does not contribute to input-by-number via numeric keypad, and
KBDALT
is set, and no other bits exceptKBDSHIFT
,KBDKANA
are set: then the bitKBDALT
is removed from the ORed mask.If
CapLock
is active,KBDSHIFT
state is flipped in the following cases: either at mostKBDSHIFT
is set in the bitmap, andCAPLOK
is set in the descriptor, or bothKBDALT
andKBDCTRL
are set in the bitmap, andCAPLOKALTGR
is set in the descriptor.Now the ORed bitmap is converted to the modification column (see above).
The key descriptor for the current virtual keycode is consulted (the “row” of the table). If
SGCAPS
flag is on,CapsLock
is active, and no other bits butKBDSHIFT
are set in the bitmap, the row is replaced by the next row.The entry at the row/column is extracted; if defined, it is either a string (zero or more UTF-16 code units), or a dead key ID (one UTF-16 unit). (Implementation: the ID is taken from the next row of the table.)
(If the ORed mask corresponds to a valid modification column, but the row does not define the behaviour at this column, and the bit
KBDCTRL
is set, and no other bits butKBDSHIFT
,KBDKANA
are set, then an autogenerated character in the range 0x00..0x1f is emitted for virtual keycodes 'A'..'Z' and widened virtual keycodes 0xFF61..0xFF91 [for latter, based on the low bits of translation-to-scancode]).The resulting units are fed to the finite automaton. When the automaton is in 0-state, a fed character unit is passed through, and a fed deadkey ID sets the state of the automaton to this number. In non-0 state, the IDs behave the same as numerically equal character units; the behaviour is described by the keyboard layout tables. The automaton changes the state according to the input; it may also emit a character (= 1 code unit; then it is always reset to 0 state). When “unrecognized input” arrives, the automaton emits the ID and the input, and resets to 0 state.
(On KEYUP event, the changes to the state of the finite-automaton are ignored. This is only relevant if
wFlags
has bit 0x02 set.)After UTF-16 units are passed through the automaton, its output is returned by ToUnicode(). If the automaton is in non-0 state, the state ID becomes the output.
NOTE: MSKLC restricts the length of the string associated to the row/column cell to be at most 4 UTF-16 code units. The restriction for keyboard layouts created with other tools is the maximal length 255 bytes storable in KBDTABLES.cbLgEntry
; it results in the maximal string length of 125 code units.
NOTE: If the application uses the stardard message pump with TranslateMessage()/DispatchMessage(), the caller of ToUnicode() is TranslateMessage(). In this case, ToUnicode() is called with an output buffer consisting of 16 UTF-16 code units. For such applications, the strings associated to keypresses are truncated after 16 code units.
NOTE: If the string is “long” (i.e., defined via LIGATURES), when it is fed through the finite automaton, the transitions to non-0 state do not generate deadkey IDs in the output string. (The LIGATURES may contain strings of one code unit! This may lead to non-obvious behaviour! If pressing such a key after a deadkey generates a chained deadkey, this would happen without delivering WM_DEADKEY
message.)
NOTE: How kernel recognizes which key sequences contribute to character-by-number input via numeric KeyPad? First, the starter keydown must happen when the ORed mask contains KBDALT
, and no other bits except KBDSHIFT
and KBDKANA
. (E.g., one can press Alt
, then tap f 1 2 3
, release Alt
[with 1,2,3 on the numeric keypad]. This would deliver Alt-f
, then 1
would start character-by-number input provided Alt
and NumPad1
together have ORed mask “in between” of KBDALT
and KBDALT|KBDSHIFT|KBDKANA
.)
After the starter keydown (NumPad: 0..9, DOT, PLUS) is recognized as such, all the keydowns should be followed by the corresponding keyup (keydowns-due-to-repeat are ignored); more precisely, between two KEYDOWN events, the KEYUP for the first of them must be present. (In other words, KEYDOWN/KEYUP events must come in the expected order, maybe with some intermixed “extra” KEYUP events.) In the decimal mode (numeric starter) only the keys with scancodes of NumPad 0..9 are allowed. In the hex mode (starter is NumPad's DOT or PLUS) also the keys with virtual codes '0'..'9' and 'A'..'F' are allowed. The sequence is terminated by releasing VK_MENU
(=Alt
) key.
NOTE: In most cases, the resulting number is reduced mod 256. The exceptions are: the starter key is KeyPadPLUS
, or the translate-to codepage is multibyte (then a number above 255 is interpreted as big-endian combination of bytes). In multibyte codepages, numbers 0x80..0xFF are considered in cp1252
codepage (unless the translate-to codepage is Japanese, and the number’s codepoint is Katakana).
NOTE: If the starter key is KeyPad0
or KeyPadDOT
, the number is a codepoint in the default codepage of the keyboard layout; if it is another digit, it is in the OEM codepage. Enabling hex modes (KeyPadPLUS
or KeyPadDOT
) requires extra tinkering; see "Hex input of unicode is not enabled".
NOTE: since keyboard layout normally map Alt
to the mask KBDALT
, and do not define a modification column for the ORed mask =KBDALT
, and KBDALT
is NOT stripped for key events in input-by-number, these key events usually do not generate spurious WM_CHAR
s.
NOTE: if the bit 0x01 of wFlags
is intended to be set, then there is a way to query the kernel “what would happen if a particular key with a particular combination of modifiers were pressed now”. (Recall that a “usual” ToUnicode() call is “destructive”: it modifies the state of the keyboard stored in the kernel. The information about whether one is in the middle of entering-by-number and/or whether one is in a middle of a deadkey sequence is erased or modified by such calls.) In general, there is no way preserve the state of entering-by-number; however, in presence of bit 0x01, this is of no concern, so a solution exists.
Using wFlags=0x01|0x02
, and setting the high bit of wScanCode
gives the same result as ToUnicode() with wFlags=0x01
and no high bit in wScanCode
. Moreover, this preserves the state of the deadkey-finite-automaton. This way, one gets a “nondestrictive” flavor of ToUnicode().
Keyboard input on Windows, Part III: Customary “special” keybindings of typical keyboards
Typically, keyboards define a few keypresses which deliver “control” characters (for benefits of console applications). As shown above, even if the keyboard does not define Control-letter
combinations (but does define modification column for Ctrl
which is associated to KBDCTRL
— with maybe KBDSHIFT
, KBDKANA
intermixed), WM_CHAR
with ^letter
will be delivered to the application. Same with happen for combinations with modifiers which produce only KBDCTRL
, KBDSHIFT
, KBDKANA
.
Additionally, the typical keyboards also define the following bindings:
Ctrl-Space ——→ 0x20
Esc, Ctrl-[ ——→ 0x1b
Ctrl-] ——→ 0x1d
Ctrl-\ ——→ 0x1c
BackSpace ——→ ^H
Ctrl-BackSpace ——→ 0x7f
Ctrl-Break ——→ ^C
Tab ——→ ^I
Enter ——→ ^M
Ctrl-Enter ——→ ^J
In addition to this, the standard US keyboard (and keyboards built by this Perl module) define the following bindings with Ctrl-Shift
modifiers:
@ ——→ 0x00
^ ——→ 0x1e
_ ——→ 0x1f
Can an application on Windows accept keyboard events? Part I: insert only
The logic described above makes the kernel deliver more or less “correct” WM_(SYS)CHAR
messages to the application. The only bindings which may be defined in the keyboard layout, but will not be seen as WM_(SYS)CHAR
are those in modification columns which involve KBDALT
, and do not involve any bits except KBDSHIFT
and KBDKANA
. (Due to the stripping of KBDALT
described above, these modification columns are never accessed — well, they are, but only for input-by-number.)
Try to design an application with an entry field; the application should insert ALL the characters ”delivered for insertion” by the keyboard layout and the kernel. The application should not do anything else for all the other keyboard events. First, ignore the KBDALT
stripping.
Then the only WM_(SYS)CHAR
which are NOT supposed to insert the contents to the editable UI fields are the "Customary “special” keybindings" described above. They are easy to recognize and ignore: just ignore all the WM_(SYS)CHAR
carrying characters in the range 0x00..0x1f
, 0x7f
, and ignore 0x20
delivered when one of Ctrl
keys is down. So the application which inserts all the other WM_(SYS)CHAR
s will follow the intent of the keyboard as close as possible.
Now return to consideration of KBDALT
stripping. If the application follows the policy above, pressing Alt-b
would enter b
— provided Alt
is mapped to KBDALT
, as done on standard keyboards. So the application should recognize which WM_CHAR
carrying b
are actually due to stripping of KBDALT
, and should not insert the delivered characters.
Here comes the major flaw of the Windows’ keyboard subsystem: the kernel translates SCANCODE —→ VK_CODE —→ ORED_MASK —→ MODIFICATION_COLUMN, then operates in terms of ORed masks and modification columns. The application can access only the first two levels of this translation; one cannot query the kernel for any information about the last two numbers. (Except for the API VkKeyScanEx(), but it is unclear how this API may help: it translates “in wrong direction” and covers only BMP.) Therefore, there is no bullet-proof way to recognize when WM_(SYS)CHAR
arrived due to KBDALT
stripping.
NOTE: of course, if only Shift/Alt/Ctrl
keys are associated to non-0 ORed mask bitmaps, and they are associated to the “expected” KBDSHIFT/KBDALT/KBDCTRL
bits, then the application would easily recognize this situation by checking whether Alt
is down, but Ctrl
is not. (Also observe that this is exactly the situation distinguishing WM_CHAR
from WM_SYSCHAR
— no surprises here!)
Assuming that the application uses this method, it would correctly recognize stripped events on the “primitive” keyboards. However, on a keyboard with an extra modifier key (call it Super
; assume its mask involves a non-SHIFT/ALT/CTRL/KANA bit), the Alt-Super-key
combination will not be stripped by the kernel, but the application would think that it was, and would not insert the character in WM_CHAR
message. A bug!
Moreover, if “supporing only the naive mapping” were a feasible restriction, there would be no reason for the kernel to go through the extra step of “the ORed mask”. Actually, to have a keyboard which is simultaneously backward compatible, easy for users, and covering a sufficiently wide range of possible characters, one must use more or less convoluted implementations (as in "A convenient assignment of KBD*
bitmaps to modifier keys").
CONCLUSION: the fact that the kernel and the applications speak different incompatible languages makes even the primitive task discussed here impossible to code in a bullet-proof way. A heuristic workaround exists, but it will not work with all keyboards and all combinations of modifiers.
CAVEAT with the above assignment: some applications (e.g., Emacs) manage to distinguish lCtrl+lAlt
combination of modifier keys from the combination lCtrl+rAlt
produced by a typical AltGr
; these applications are able to use lCtrl+lAlt
-modified keys as a bindable accelerator keys. We address this question in the Part IV.
Can an application on Windows accept keyboard events? Part II: special key events
In the preceding section, we considered the most primitive application accepting the user inserting of characters, and nothing more. “Real applications” must support also keyboard actions different from “insertion”; so those KEYDOWN events which are not related to insertion may trigger some “special actions”. To model a full-featured keyboard input, consider the following specification:
As above, the application has an entry field, and should insert ALL the characters ”delivered for insertion” by the keyboard layout and the kernel. For all the keyboard events not related to insertion of characters, the application should write to the log file which of Ctrl/Alt/Shift
modifiers were down, and the virtual keycode of the KEYDOWN event. Again, at first, we ignore the KBDALT
stripping.
At first, the problem looks simple: with the standard message pump, when WM_(SYS)KEYDOWN
message is processed, the corresponding WM_(SYS)(DEAD)CHAR
messages are already sent to the message queue. One can PeekMessage() for these messages; if present, and not “special”, they correspond to “insertion”, so nothing should be written to the log. Otherwise, one reports this WM_(SYS)KEYDOWN
to the log.
Unfortunately, this solution is wrong. Inspect again what the kernel is delivering during the input-by-number via numeric keyboard: the KEYDOWN for decimal/hex digits is a part of the “insertion”, but it does not generate any WM_(SYS)(DEAD)CHAR
. Essentially, the application may see Alt-F
pressed during the processing of Alt-NumPadPlus+F+1+2
, but even if Alt-F
is supposed to format the paragraph, this action should not be triggered (but U+0F12
should be eventually inserted).
CONCLUSION: Input-by-number is getting in the way of using the standard message pump. SOLUTION
: one should write a clone of TranslateMessage() which delivers suitable WM_USER*
messages for KEYDOWN/KEYUP involved in Input-by-number. Doing this, one can also remove sillyness from the Windows’ handling of Input-by-number (such as taking mod 256
for numbers above 255).
POSSIBLE IMPLEMENTATION: myTranslateMessage() should:
when non handling input-by-number, call ToUnicode(), but use
wFlag=1
, so that ToUnicode() does not handle input-by-number.Recognize input-by-number starters by the scancode/virtual-keycode, the presence of
VK_MENU
down, and the fact that ToUnicode() produces nothing or'0'..'9','.',',','+'
.After the starter, allow continuation by checking the scancode/virtual-keycode and the presence of
VK_MENU
down. Do not call ToUnicode() for continuation keydown/up events.After a chain of continuations followed by KEYUP for
VK_MENU
, one should PostMessage() forWM_(UNI)CHAR
with accumulated input.
Combining this with the heuristical recognition of stripped KBDALT
, one gets an architecture with a naive approximation to handling of Alt
(but still miles ahead of all the applications I saw!), and bullet-proof handling of other combinations of modifiers.
NOTE: this implementation of MyTranslateMessage() loses one “feature” of the original one: that input-by-number is disabled in the presence of (popup) menu. However, since I never saw this “feature” in action (and never have heard of it described anywhere), this must be of negligible price.
NOTE: ALL the applications I checked do this logic wrong. Most of them check FIRST for “whether the key event looks like those which should trigger special actions”, then perform these special actions (and ignore the character payload).
As shown above, the reasonable way is to do this in the opposite order, and check for special actions only AFTER it is known that the key event does not carry a character payload. The impossibility of reversing the order of these checks is due to the same reason as one discussed above: the kernel and application speaking different languages.
Indeed, since the application knows nothing about ORed masks, it has no way to distinguish that, for example, lCtrl-rCtrl-=
may be SUPPOSED to be distinct from lCtrl-=
and rCtrl-=
, and while the last two do not carry the character payload, the first one does. Checking FIRST for the absense of WM_(SYS)(DEAD)CHAR
delegates such a discrimination to the kernel, which has enough information about the intent of the keyboard layout. (Likewise, the keyboard may define the pair of DEADKEY
and Ctrl-A
to insert ᵃ. Then Ctrl-A
alone will not carry any character payload, its combination with a deadkey may.)
Why the applications are trying to grab the potential special-key messages as early as possible? I suspect that the developers are afraid that otherwise, a keyboard layout may “steal” important accelerators from the application. While this is technically possible, nowadays keyboard accelerators are rarely the only way to access features of the applications; and among hundreds of keyboard layout I saw, all but 2 or 3 would not “steal” anything from applications. (Or maybe the developers just have no clue that the correct solution is so simple?)
NOTE: Among the applications I checked, the worst offender is Firefox. It follows a particularly unfortunate advice by Mike Kaplan and tries to reconstruct the mentioned above row/columns table of the keyboard layout, then uses this (heuristically reconstructed) table as a substitute for the real thing. And due to the mismatch of languages spoken by kernel and applications, working via such an attempted reconstruction turns out to have very little relationship to the actually intended behaviour of the keyboard (the behaviour observed in less baroque applications). In particular, if keyboards uses different modification columns for lCtrl-lAlt
and AltGr
=rAlt
modifiers, pressing AltGr-key
inputs wrong characters in Firefox.
NOTE: Among notable applications which fail spectacularly is Emacs. The developers forget that for a generation, it is already XXI century; so they use ToAscii() instead of ToUnicode()! (Even if ToUnicode() is available, its result is converted to the result of the corresponding ToAscii() code.)
In addition to 8-bitness, Emacs also suffers from check-for-specials-first syndrome…
Can an application on Windows accept keyboard events? Part III: better detection of KBDALT
stripping
We explained above that it is not possible to make a bullet-proof algorithm handling the case when KBDALT
might have been stripped by the kernel. The very naive heuristic algorithm described there will recognize the simplest cases, but will also have many false positives: for many combinations it will decide that KBDALT
was stripped while it was not. The result will be that when the kernel reports that the character X
is delivered, the application would interpret it as Alt-X
, so X
would not be inserted. It will not handle, for example, the lAlt-Menu-key
modifier combinations with the assignment of mask from that section.
Indeed, with this assignment, the only combination of modifiers for which the kernel will strip KBDALT
is lAlt
(and lAlt+Win
if one does not assign any bits to Win
). So lAlt-Menu-key
is not stripped, hence the correct WM_*CHAR
is delivered by the kernel. However, since this combination is still visible to the application as having Alt
, and not having Ctrl
, it is delivered as the SYS
flavor.
So the net result is: one designed a nice assignment of masks to the modifier keys. This assignment makes keypresses successfully navigate around the quirks of the kernel’s calculations of the character to deliver. However, the naive algorithm used by the application will force the application to ignore this correctly delivered character to insert.
A very robust workaround for this problem is introduced in the Part IV. What we discuss here is a simple heuristic to recognize the combinations involving Alt
and an “unexpected modifier”, so that these combinations become exceptions to the rule “SYS
flavor means ‘do not insert’”.
NAIVE SOLUTION: when WM_SYS*CHAR
message arrives, inspect the virtual keycodes which are reported as pressed. Ignore the keycode for the current message. Ignore the keycodes for “usual modifiers” (Shift/Alt/Kana
) which are expected to keep stripping. Ignore the keycode for the keys which may be kept “stuck down” by the keyboards (see "Far Eastern keyboards on Windows"). If some keycode remains, then consider it as an “extra” modifier, and ignore the fact that the message was of SYS
flavor.
So all one must do is to define one user message (for input-by-number-in-progress), code two very simple routines, MyTranslateMessage() and HasExtraModifiersHeuristical(), and perform two PeekMessage() on KEYDOWN event, and one gets a powerful almost-robust algorithm for keyboard input on Windows. (Recall that all the applications I saw provide close-to-abysmal support of keyboard input on Windows.)
Can an application on Windows accept keyboard events? Part IV: application-specific modifiers
Some application handle certain keys as “extra modifiers for the purpose of application-specific accelerator keypresses”. For example, Emacs may treat the ApplicationMenu
in this way (as a Super
modifier for its bindable-keys framework). Usually, ApplicationMenu
does not contribute anything into the ORed mask; hence, ApplicationMenu-letter
combination will deliver the same character as just letter
alone. When the application treats ApplicationMenu-letter
as an accelerator, it must ignore the character delivered by this combination.
Additionally, many keyboard layouts use the KLLF_ALTGR
flag (it makes the kernel to fake pressing/releasing the left Ctrl
key when the right Alt
is pressed/released) with “standard” assignments of the ORed masks. On such keyboards, pressing right Alt
(i.e., AltGr
) delivers the same characters as pressing any Ctrl
together with any Alt
. On the other hand, an application may distinguish left-Ctrl
combinined with left-Alt
from AltGr
pressed on such keyboards by inspecting which (virtual) keys are currently down. So the application may consider left-Ctrl
combinined with left-Alt
as “intended to be an accelerator”; then the application would ignore the characters delivered by such a keypress.
One can immediately see that such applications would inevitably enter into conflict with keyboards which define these key combinations. For example, on a keyboard which defines an ORed mask for ApplicationMenu
, pressing ApplicationMenu-letter
should deliver a different character than pressing letter
. However, the application does not know this, and just ignores the character delivered by ApplicationMenu-letter
.
A similar situation arises when the keyboard defines leftCtrl-leftAlt-letter
to deliver a different character than AltGr-letter
. Again, the character will be ignored by the application. Since the fact that such a “unusual” keyboard is active implies user's intent, such behaviour is a bug of the application.
CONCLUSION: an application must interpret a keypress as “intended to be an accelerator” only if this keypress produces no character, or produces the same character as the key without the “extra” modifier. (Correspondingly, if replacing leftAlt
by rightAlt
does not change the delivered character.)
IMPLEMENTATION: to do this, the application must be able to query “what would happen if the user pressed different key combinations?”; such a query requires “non-destructive” calls of ToUnicode(). (These calls must be done before the “actual”, destructive, call of ToUnicode() corresponding to the currently pressed down modifiers.)
Fortunately, with the framework described in the Part III, the call of ToUnicode() is performed with wFlags
being 0x01. As explained near the end of the section "Keyboard input on Windows, Part II: The semantic of ToUnicode()", this call has a “non-destructive” flavor! Hence, for applications with such “enhanced” modifier keys, the logic of the Part III should be enhanced in the following ways:
Make a non-destructive call of ToUnicode(). Store the result. If no insertable character (or deadkey) is delivered, ignore the rest.
If both left
Ctrl
and leftAlt
are down (AND rightCtrl
AND rightAlt
are up!) replace leftAlt
by the rightAlt
, and make another non-destructive call of ToUnicode(). If the result is identical to the first one, markleftCtrl+leftAlt
as “special modifiers present for accelerators”.Remove left
Ctrl
and leftAlt
from the collection of keys which are down (argument to ToUnicde()), and continue with the previous step. (This may be generalized to other combinations of left/rightAlt
/Ctrl
.)For every other “special modifier” virtual key which is down, make another non-destructive call of ToUnicode() with this virtual key up. If the result is identical to the first one, mark this “special modifier” as “present for accelerators”.
Finally, if nothing suitable for accelerators is found, make a “usual” call of ToUnicode() (so that on future keypresses the deadkey finite automaton behaves as expected). Generate the corresponding messages.
If no insertable character is delivered, or suitable “extra” accelerators are found, the process-the-accelerator logic should be triggered.
For example, if the character Ω is delivered, and a special modifier ApplicationMenu
is down and marked as suitable as accelerator, then Ω will be ignored. The accelerator for ApplicationMenu-Ω
should be triggered. (Processing this as ApplicationMenu-Shift-ω
may be also done. This may require an extra non-destructive call.)
An alternative logic is possible: if this Ω was generated by modifiers lCtrl-rAlt-Shift-ApplicationMenu
with the virtual key VK_W
, then the application may query what VK_W
generates standalone (for example, cyrillic ц), and trigger the accelerator for Ctrl-Alt-Shift-ApplicationMenu-ц
. (This assumes that lCtrl-rAlt-Shift
with VK_W
generates the same Ω!)
If no character is delivered, then this is a “trivial” situation, and the framework of accelerator keys should be called as if the complication considered here did not exist.
NOTE: this logic handles the intended behaviour of Alt
key as well! So, with this implementation, the application would
Handle
Alt
-NUMPAD input-by-number in an intuitive mostly compatible with Windows way (but not bug-for-bug compatible with the Windows' way);Would recognize
Alt
modifier which does not change the delivered character as such. (So it may be processed as the menu accessor.)Would recognize all the key combinations defined by the keyboard layout (and deliverable via ToUnicode());
Would recognize all the application-specific extra modifier keys which do not interfere with the key combinations defined by the keyboard layout.
Far Eastern keyboards on Windows
The syntax of defining these keyboards is documented in kbd.h of the toolkit. The semantic of the NLS table is undocumented. Here we fix this.
The function returning the NLS table should be exported with ordinal 2. The offsets of both tables in the module should be below 0x10000. The keyboard layout should define a function with ordinal 3 or 5 returning 0, or be loaded through such a function returning non-0; the signature is
BOOL ordinal5(HKL hkl, LPWSTR __OUT__ dllname , PCLIENTKEYBOARDTYPE type_if_remote_session, LPVOID dummy);
BOOL ordinal3(LPWSTR __OUT__ dllname);
if return is non-0, keyboard is reloaded from dllname
.
In short, these layouts have an extra table which may define the following enhancements:
One 3-state (or 2-state) radio-button:
on keys with VK codes DBE_ALPHANUMERIC/DBE_HIRAGANA/DBE_KATAKANA
(the third state can be also toggled independently of the others).
Three Toggling (like CAPSLOCK) button (pairs):
toggling radio-button-like VK codes DBE_SBCSCHAR/DBE_DBCSCHAR, DBE_ROMAN/DBE_NOROMAN, DBE_CODEINPUT/DBE_NOCODEINPUT
Make key produce different VK codes with different modifiers.
Make a “reverse NUMPAD” translation.
Manipulate a couple of bits of IME state.
A few random hacks for key-deficient hardware layouts.
(Via assigning ORed masks to radio-buttons, the radio-buttons and toggle-buttons above may affect the layout. Using this, it is easy to convert each toggling buttons to 2-state radiobuttons. The limitation is that the number of modification columns compatible with the extra table is at most 8 — counting one for Ctrl
.)
Every VK
may be associated to two tables of functions, the “normal” one, and the “alternative” one. For every modification column, each table assigns a filter id, and a parameter for the filter. (Recall that columns are associated to the ORed masks by the table in the MODIFIERS
structure. One must define all the entries in the table — or at least the entries reachable by the modifier keys. NOTE: the limit on the number of states in the tables is 8; it is not clear what happens with the states above this; some versions of Windows may buffer-overflow.)
The input/output for the filters consists of: the VK
, UP
/DOWN
flag, the flags associated to the scancode in KBDTABLES->ausVK
(may be added to upsteam), the parameter given in VK_F
structure (and an unused DWORD
read/write parameter). A filter may change these parameters, then pass the event forward, or it may ignore an event. Filters by ID:
KBDNLS_NULL Ignore key (should not be called; only for unreachable slots in the tables).
KBDNLS_NOEVENT Ignore key.
KBDNLS_SEND_BASE_VK Pass through VK unchanged.
KBDNLS_SEND_PARAM_VK Replace VK by the number specified as the parameter.
KBDNLS_KANAMODE Ignore UP; on DOWN, toggle (=generate UP-or-DOWN for) DBE_KATAKANA
These 3 generate UP for “other” key, then DOWN for the target (as needed!):
KBDNLS_ALPHANUM Ignore UP; DBE_ALPHANUMERIC,DBE_HIRAGANA,DBE_KATAKANA → DBE_ALPHANUMERIC
KBDNLS_HIRAGANA Ignore UP; DBE_ALPHANUMERIC,DBE_HIRAGANA,DBE_KATAKANA → DBE_HIRAGANA
KBDNLS_KATAKANA Ignore UP; DBE_ALPHANUMERIC,DBE_HIRAGANA,DBE_KATAKANA → DBE_KATAKANA
KBDNLS_SBCSDBCS Ignore UP; Toggle DBE_SBCSCHAR / DBE_DBCSCHAR
KBDNLS_ROMAN Ignore UP; Toggle DBE_ROMAN / DBE_NOROMAN
KBDNLS_CODEINPUT Ignore UP; Toggle DBE_CODEINPUT / DBE_NOCODEINPUT
KBDNLS_HELP_OR_END Pass-through if NUMPAD flag ON (in ausVK); send-or-toggle HELP/END (see below)
KBDNLS_HOME_OR_CLEAR Pass-through if NUMPAD flag ON (in ausVK); send HOME/CLEAR (see below)
KBDNLS_NUMPAD If !NUMLOCK | SHIFT, replace NUMPADn/DECIMAL by no-numpad flavors
KBDNLS_KANAEVENT Replace VK by the number specified as the parameter. On DOWN, see below
KBDNLS_CONV_OR_NONCONV See below
The startup values are ALPHANUMERIC
, SBCSCHAR
, NOROMAN
, NOCODEINPUT
.
Typical usages:
KBDNLS_KANAMODE (VK_KANA (Special case))
KBDNLS_ALPHANUM (VK_DBE_ALPHANUMERIC)
KBDNLS_HIRAGANA (VK_DBE_HIRAGANA)
KBDNLS_KATAKANA (VK_DBE_KATAKANA)
KBDNLS_SBCSDBCS (VK_DBE_SBCSCHAR/VK_DBE_DBCSCHAR)
KBDNLS_ROMAN (VK_DBE_ROMAN/VK_DBE_NOROMAN)
KBDNLS_CODEINPUT (VK_DBE_CODEINPUT/VK_DBE_NOCODEINPUT)
KBDNLS_HELP_OR_END (VK_HELP or VK_END) [NEC PC-9800 Only]
KBDNLS_HOME_OR_CLEAR (VK_HOME or VK_CLEAR) [NEC PC-9800 Only]
KBDNLS_NUMPAD (VK_xxx for Numpad) [NEC PC-9800 Only]
KBDNLS_KANAEVENT (VK_KANA) [Fujitsu FMV oyayubi Only]
KBDNLS_CONV_OR_NONCONV (VK_CONVERT and VK_NONCONVERT) [Fujitsu FMV oyayubi Only]
Toggle (= 2-state) and 3-state radio-keys are switched by sending KEYUP for the currently “active” key, then KEYDOWN for the newly activated key. When switching 3-state, additional action happens depending on the new state:
DBE_ALPHANUMERIC If IME is off, and KANA toggle is on, switch IME on in the KATAKANA mode
DBE_HIRAGANA If IME is off, and KANA toggle is off, switch IME off in the ALPHANUMERIC mode
DBE_KATAKANA SAME AS HIRAGANA
Additionally, KEYDOWN
of KBDNLS_KANAEVENT
switches IME to
KANA toggle on: switch IME off in the ALPHANUMERIC mode
KANA toggle off: switch IME on in the KATAKANA mode
and KBDNLS_CONV_OR_NONCONV
(on KEYUP
and KEYDOWN
) passes through, and does
KANA toggle on, IME off: switch IME off in the ALPHANUMERIC mode
otherwise: Do nothing
(The semantic of IME being-in/switching-to OFF/ON mode is not clear (probably IME-specific). The switching happens by calling RequestDeviceChange(pDeviceInfo, GDIAF_IME_STATUS, TRUE)
for devices with a handle
and type == DEVICE_TYPE_KEYBOARD
, while putting the request at into global memory — unless IMECOMPAT_HYDRACLIENT
flag is set on the foreground keyboard.)
For KBDNLS_HOME_OR_CLEAR
, the registry is checked at statup. For KBDNLS_HELP_OR_END
, the registry is checked at statup, and:
KANA_AWARE: flips END/HELP if KANA toggle is ON (on input, “HELP” means not-an-END)
otherwise: sends END/HELP depending on what registry says.
The checked values are helpkey
, KanaHelpKey
, clrkey
in the hive RTL_REGISTRY_WINDOWS_NT\WOW\keyboard
.
Which of two tables is chosen is controlled by the type (NULL
/NORMAL
/TOGGLE
) of the key's tables, and the (per key) history bit. The initial state of the bit is in NLSFEProcCurrent
(StuxNet hits here!). The tables of type NULL
are ignored (the key descriptor passes all events through), the NORMAL
key uses only the first table. The TOGGLE
key uses the first table on KEYDOWN, and uses the first or the second table on KEYUP. The choice depends on modifiers present in the preceding KEYDOWN; the bitmap NLSFEProcSwitch
is indexed by the modification column of KEYDOWN event; the second table is used on the following KEYUP if the indexed bit is set. (The KEYREPEAT events are handled the same way as KEYUP.)
The typical usage of TOGGLE
keys is to make the KEYUP event match what KEYDOWN did no matter what is the order of releasing the modifier keys and the main key. Having the history bit up “propagates” to KEYUP the information about which modifiers were active on KEYDOWN. This helps in ensuring consistency of some actions between the KEYDOWN event and the corresponding KEYUP event: remember that the state of modifiers on KEYUP is often different than the state on KEYDOWN: people can release modifiers in different orders:
press-Shift, press-Enter, release-Shift, release-Enter ---> Shift-Enter pressed, Enter released
press-Shift, press-Enter, release-Enter, release-Shift ---> Shift-Enter pressed and released
If pressing Shift-Enter
acts as if it were the F38
key (and only so with Shift
!), to ensure consistency, one would need to make releasing Shift-Enter
and also releasing Enter
to act as if it were the F38
key. So one can make pressing Shift-Enter
special (via the first table), sets the history bit on Shift-Enter
, and make the second table map Enter
and Shift-Enter
to be special too (send F38
) if the history bit is set.
Remark: the standard key processing has its own filters too. AltGr
processing adds fake lCtrl
up/down events (provided the flag KLLF_ALTGR
is set); Shift-Cancels-CapsLock
processing ignores/fakes the KEYDOWN
/KEYUP
for VK_CAPITAL
(=CapsLock
) (provided the flag KLLF_SHIFTLOCK
is set); Shift-Multiply
becomes VK_SNAPSHOT
(same for Alt
); Ctrl-ScrollLck/Numlock
become VK_CANCEL
/VK_PAUSE
; Ctrl-Pause
may become VK_CANCEL
. OEM translations (NumPad→Cursor, except C-A-Del
; 00
to double-press of 0
) come first, then locale-specific (AltGr
, Shift-Cancels-CapsLock
), then those defined in the tables above.
Remark: As opposed to these translations, KLLF_LRM_RLM
and Alt-NUMPADn
is actually handled inside the even loop, by ToUnicode().
Remark: http://www.toppa.com/2007/english-windows-xp-with-a-japanese-keyboard/ (and references inside!) explains fine points of using Japanese keyboards. See also: http://www.coscom.co.jp/learnjapanese801/lesson08.html.
A convenient assignment of KBD*
bitmaps to modifier keys
In this section, we omit discussion of Shift
modifier; so every bitmap may be further combined with KBDSHIFT
to produce two different bindings. Assign ORed masks to the modifier keys as follows:
lCtrl Win lAlt rAlt Menu rCtrl
CTRL|LOYA CTRL|X1 ALT|KANA CTRL|ALT|LOYA|X1 CTRL|ALT|X2 CTRL|ALT|ROYA
with suitable backward-compatible mapping of ORed masks to modification columns. This assignment allows using KLLF_ALTGR
flag (faking presses of lCtrl
when rAlt
is pressed — this greatly increases compatibility of rAlt
with brain-damaged applications), all the combinations involving at most one of lCtrl
, Win
or rAlt
give distinct ORed masks, it avoids stripping of KBDALT
on lAlt
combined with other modifiers, makes CapsLock
work with all relevant combinations, while completely preserving all application-visible properties of keyboard events [except those with lCtrl-Win-lAlt-
modifiers; this combination is equivalent to lAlt-rAlt-
].
Note that ignoring the CTRL
and ALT
bits, all combinations of LOYA,KANA,X1,X2,ROYA
are possible, which gives at least 32 Shift
-pairs. In fact, the only combination of LOYA,KANA,X1,X2,ROYA
which may appear with different CTRL,ALT
bits is LOYA|X1
; hence there are 33 possible combinations of CTRL,ALT,LOYA,KANA,X1,X2,ROYA
. Indeed, CTRL
is determined by LOYA|X1|X2|ROYA
. If one of KANA,X2,ROYA
is present, then ALT
is set; so assume KANA,X2,ROYA
are not present. But then, if ALT
may be set, then both LOYA|X1
must be present; which gives the only duplication.
Leaving out 5 combinations of lCtrl
, Win
, lAlt
[8, minus the empty one, and lCtrl+lAlt
, which is avoided by most application due to its similarity to AltGr=rAlt
, and lCtrl+Win+lAlt
which is undistinguishable by the mask from lAlt+rAlt
] to have bindable keypresses in applications, and having rCtrl
as equivalent to lCtrl
, this gives 27 Shift
-pairs which may produce characters.
NOTE: lCtrl+Win+lAlt
being undistinguishable by the mask from lAlt+rAlt
is not a big deal, since there is no standard keyboard shortcuts involving Ctrl+Win+Alt
.
NOTE: Combinations of lCtrl
with rCtrl
cause several problems.
NOTE: Removing the binding for Win
key, only 21 useful Shift
-pairs remain. (This is what version 0.63
of izKeys keyboard layout is using; out of 24 distinct combinations, lAlt
, lCtrl
and rCtrl
should be excluded.) Trivia: While this may look as a complete overkill, recall that characters outside BMP can be inserted on Windows only via one keypress, possibly with many modifiers. (This restriction relates only to the “classical” flavor of Windows keyboard layouts). Unicode defines 18 additional Latin/Greek alphabets for mathematical discourse. If a keyboard layout would want to support these letters, this would quickly exhaust the possible combinations of modifiers. (For 2-script layout, one could live with Latin/AltGr-Latin/Greek + 18 mathematical alphabets. But for layouts supporting more scripts, it lookes like using Win
key is not avoidable.)
NOTE: Applications may call ToUnicode() with impossible combinations of modifiers: for example, they may put Ctrl
down, but do not specify whether it is rCtrl
or lCtrl
. Likewise for Alt
.
To support that, one would need to define a mask for standalone VK_CONTROL
and VK_MENU
(i.e., Ctrl
and Alt
). Since these modifiers are present when the real “left-right-handed” keys are down, the masks should be “contained” in the masks of handed keys. Example: one can make the pseudo-key Ctrl
to generate bit CTRL
, and the pseudo-key Alt
to generate the bit ALT
. Then for any combination of modifiers with unhanded Ctrl
and/or Alt
, either the corresponding combination of bits is supported by the layout (and then the application will access the corresponding modification column — which is probably not the “expected” column corresponding to some handed flavor), or the combination is not yet defined. In the latter case, one may actually decide how to resolve this: one can map this combination of modifiers to an arbitatrary modification column!
In particular, one can map such combination of modifiers to a certain choice of handedness of Ctrl
and Alt
. (An example of such a problematic application is Firefox; look for “impossible modifier”.)
NOTE: The maximal number of “modification columns” supported by Windows is 126. A larger number would make the size of VK_TO_WCHARS...
to overflow the maximal number storable in the field VK_TO_WCHAR_TABLE.cbSize
of type BYTE
= unsigned char
.
Given that the column 15 is ignored, this reduces the number of strings associated to a keypress (with different “modifiers”) to 125.
WINDOWS GOTCHAS
First of all, keyboard layouts on Windows are controlled by DLLs; the only function of these DLLs is to export a table of "actions" to perform. This table is passed to the kernel, and that's it - whatever is not supported by the format of this table cannot be implemented by native layouts. (The DLL performs no "actions" when actual keyboard events arrive.)
Essentially, the logic is like that: there are primary "keypresses", and chained "keypresses" ("prefix keys" [= deadkeys] and keys pressed after them). Primary keypresses are distinguished by which physical key on keyboard is pressed, and which of "modifier keys" are also pressed at this moment (as well as the state of "latched keys" - usually CapsLock
only, but may be also Kana
). This combination determines which Unicode character is generated by the keypress, and whether this character starts a "chained sequence".
On the other hand, the behaviour of chained keys is governed ONLY by Unicode characters they generate: if there are several physical keypresses generating the same Unicode characters, these keypresses are completely interchangeable inside a chained sequence. (The only restriction is that the first keypress should be marked as "prefix key"; for example, there may be two keys producing - so that one is producing a "real dash sign", and another is producing a "prefix" -.)
The table allows: to map ScanCode
s to VK_key
s; to associate a VK_key
to several (numbered) choices of characters to output, and mark some of these choices as prefixes (deadkeys). (These "base" choices may contain up to 4 16-bit characters (with 32-bit characters mapped to 2 16-bit surrogates); but only those with 1 16-bit character may be marked as deadkeys.) For each prefix character (not a prefix key!) one can associate a table mapping input 16-bit "base characters" to output 16-bit characters, and mark some of the output choices as prefix characters.
The numbered choices above are determined by the state of "modifier keys" (such as Shift
, Alt
, Control
), but not directly. First of all, VK_keys
may be associated to a certain combination of 6 "modifier bits" (called "logical" Shift
, Alt
, Control
, Kana
, User1
and User2
, but the logical bits are not required to coincide with names of modifier keys). (Example: one can bind Right Control
to activate Shift
and Kana
bits.) The 64 possible combinations of modifier bits are mapped to the numbered choices above.
Additionally, one can define two "separate numbered choices" in presence of CapsLock (but the only allowed modifier bit is Shift
). The another way to determine what CapsLock
is doing: one can mark that it flips the "logical Shift
" bit (separately on no-modifiers state, Control-Alt
-only state, and Kana
-only state [?!] - here "only" allow for the Shift
bit to be ON
).
AltGr
key is considered equivalent to Control-Alt
combination (of those are present, or always???), and one cannot bind Alt
and Alt-Shift
combinations. Additionally, binding bare Control
modifier on alphabetical keys (and SPACE
, [
, ]
, \
) may confuse some applications.
NOTE: there is some additional stuff allowed to be done (but only in presence of Far_East_Support installed???). FE-keyboards can define some sticky state (so may define some other "latching" keys in addition to CapsLock
). However, I did not find a clear documentation yet (keyboard106
in the DDK toolkit???).
There is a tool to create/compile the required DLL: kbdutool.exe of MicroSoft Keyboard Layout Creator (with a graphic frontend MSKLC.exe). The tool does not support customization of modifier bits, and has numerous bugs concerning binding keys which usually do not generate characters. The graphic frontend does not support chained prefix keys, adds another batch of bugs, and has arbitrarily limitations: refuses to work if the compiled version of keyboard is already installed; refuses to work if SPACE
is redefined in useful ways.
WORKFLOW: uninstall the keyboard, comment the definition of SPACE
, load in MSKLC and create an install package. Then uncomment the definition of SPACE
, and compile 4 architecture versions using kbdutool, moving the DLLs into suitable directories of the install package. Install the keyboard.
For development cycle, one does not need to rebuild the install package while recompiling.
The following sections classify GOTCHAS into 3 categories:
"WINDOWS GOTCHAS for keyboard users"
"WINDOWS GOTCHAS for keyboard developers using MSKLC"
"WINDOWS GOTCHAS for keyboard developers (problems in kernel)"
WINDOWS GOTCHAS for keyboard users
MSKLC keyboards not working on Windows 8 without reboot
The layout is shown as active, but "preview" is grayed out, and is not shown on the Win-Space list. See also:
http://www.errordetails.com/125726/activate-custom-keyboard-layout-created-with-msklc-windows
The workaround is to reboot. Compare with
http://blogs.msdn.com/b/michkap/archive/2012/03/12/10281199.aspx
Default keyboard of an application
Apparently, there is no way to choose a default keyboard for a certain language. The configuration UI allows moving keyboards up and down in the list, but, apparently, this order is not related to which keyboard is selected when an application starts. (This may be fixed on Windows 8?)
Hex input of unicode is not enabled
One needs to explicitly tinker with the registry (see examples/enable-hex-unicode-entry.reg) and then reboot to enable this.
Standard fonts have some chars exchanged
At least in Consolas and Lucida Sans Unicode φ and ϕ are exchanged. Compare with Courier and Times. (This may be due to the difference between Unicode's pre-v3.0 choice of representative glyphs, or the difference between French/English Apla=Didot/Porson's approaches.)
The console font configuration
According to MicroSoft, it is controlled by Registry hive
HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Console\TrueTypeFont
The key 0
usually gives Lucida Console
, and the key 00
gives Consolas
. Adding random numbers does not work; however, if one adds one more zero (at least when adding to a sequence of zeros), one can add more fonts. You need to export this hive (e.g., use
reg export "HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Console\TrueTypeFont" console-ttf.reg
), save a copy (so you can always restore if the love goes sour) then edit the resulting file.
So if the maximal key with 0s is 00
, add one extra row with an extra 0 at end, and the family name of your font. The "family name" is what the Font list in Control Panel
shows for font families (a "stacked" icon is shown); for individual fonts the weight (Regular, Book, Bold etc) is appended. So I add a line
"000"="DejaVu Sans Mono"
the result is (omitting Far Eastern fonts)
Windows Registry Editor Version 5.00
[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Console\TrueTypeFont]
"949"="..."
"0"="Lucida Console"
"950"="..."
"932"="..."
"936"="..."
"00"="Consolas"
"000"="DejaVu Sans Mono"
The full file is in examples/console-fonts00-added.reg. After importing this file via reg (or give it as parameter to regedit; both require administrative priviledges) the font is immediately available in menu. (However, it does not work in "existing" console windows, only in newly created windows.)
(Do not use the example file directly. First inspect the hive exported on your system, and find the number of 0s to use. Then add a new line with correct number of zeros - as a value, one can use the string above. This will preserve the defaults of your setup. Keep in mind that selection-by-fontfamily is buggy: if you have more than one version of the font in different weight, it is a Russian Rullette which one of them will be taken (at least for DejaVu, which uses Book
as the default weight). First install the "normal" flavor of the font, then do as above (so the system has no way of picking the wrong flavor!), and only after this install the remaining flavors.
NOTE: keep in mind that I distribute a good-for-console “merge” of two fonts: DejaVu+unifont
; DejaVu
brings in nicely shaped nicely-scalable glyphs, and unifont
brings a complete coverage of BMP (of Unicode v6.3
). (We omit Han/Hangul since it does not fit in a narrow box of a console font. Additionally, the versions of February 2014 do not include Katakana/Bopomoto/Hangul-Compatibility-Jamo since, apparently, Windows do not allow these characters in a console font.)
CAVEAT: the string to put into Console\TrueTypeFont
is the Family Name of the font. The family name is what is shown in the Fonts
list of the Control Panel
— but only for families with more than one font; otherwise the “metric name” of the font is appended.
On Windows, it is tricky to find the family name using the default Windows' tools, without inspecting the font in a font editor. One workaround is to select the font in Character Map
application, then inspect HKEY_CURRENT_USER\Software\Microsoft\CharMap\Font
via:
reg export HKCU\Software\Microsoft\CharMap character-map-font.reg
Note: the mentioned above MicroSoft KB article lists the wrong way to find the family name. What is visible in the Properties
dialogue of the font, and in CurrentVersion\Fonts
is the Full Font Name. Fortunately, quite often the full name and the family name coincide — this is what happened with DejaVu
. To find the "Full name" of the font, one can look into the hive
HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Fonts
reg export "HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Fonts" fonts.reg
For example, after installing DejaVuSansMono.ttf
, I see DejaVu Sans Mono (TrueType)
as a key in this hive.
One more remark: for desktop icons coming from the “Public” user (“shared” icons) which start a console application, the default font is not directly editable. To reset it, one must:
copy the .lnk icon file to “your” desktop directory;
start the application using the “new” icon;
change the font via “Properties” of the window's menu;
as administrator, copy the .lnk file back to the Public/Desktop directory (usually in something like C:/Users). Manually refresh the desktop. Verify that the “old” icon works as expected. (Now you can remove the “new” icon created on the first step.)
There is no way to show Unicode contents on Windows
Until Firefox v13
, one could use FireFox to show arbitrary Unicode text (limited only by which fonts are installed on your system). If you upgraded to a newer version, there is no (AFAIK) Windows program (for general public consumption) which would visualize Unicode text. The applications are limited either (in the worst case) by the characters supported by the currently selected font, or (in the best case) they can show additionally characters, but only those considered by the system as "important enough" (coming from a few of default fonts?).
There is a workaround for this major problem in FireFox (present at least up to v20
). The problem is caused by this “improvement” which blatantly saves a few seconds of load time for a tiny minority of users, the price being an unability to show Unicode for everybody (compare with comments 33 and 75 on the bug report above).
It is not documented, but this action is controlled by about:config
setting gfx.font_rendering.fallback.always_use_cmaps
. To enable Unicode, make this setting into true
(if you have it in the list as false
, double-clicking it would do this — do search to determine this; otherwise you need to create a new Binary
entry).
There is an alternative/additional way to enable extra fonts; it makes sense if you know a few character-rich fonts present on your system. The (undocumented) settings font.name-list.*.x-unicode
(apparently) control fallback fonts for situations when a suitable font cannot be found via more specific settings. For example, when you installed (free) Deja vu, junicode, Symbola fonts on your system, you may set (these variables are not present by default; you need to create new String
variables):
font.name-list.sans-serif.x-unicode DejaVu Sans,Symbola,DejaVu Serif,DejaVu Sans Mono,Junicode,unifont
font.name-list.serif.x-unicode DejaVu Serif,Symbola,Junicode,DejaVu Sans,Symbola,DejaVu Sans Mono,unifont
font.name-list.cursive.x-unicode Junicode,Symbola,DejaVu Sans,DejaVu Serif,DejaVu Sans Mono,unifont
font.name-list.monospace.x-unicode DejaVu Sans Mono,DejaVu Sans,Symbola,DejaVu Serif,Junicode,unifont
And maybe also Fantasy
font.name-list.fantasy.x-unicode Symbola,DejaVu Serif,Junicode,DejaVu Sans Mono,DejaVu Sans,unifont
(Above, we use also the unifont
as the font of last resort; it is very useful since it contains all Unicode v6.3
characters in BMP. However, the standard distribution contains glyphs for undefined characters, which get in the way when the browser tries to find the best way to show a character. Use non-mono
variant from my build of Unifont
; as opposed to the standard version, it is a properly designed TrueType font — and it scales much better to heights different from 16px.)
If you set both font.*
variables with rich enough fonts, and gfx.font_rendering.fallback.always_use_cmaps
, then you may have the best of both worlds: the situation when a character cannot be shown via font.*
settings will be extremely rare, so the possiblity of delay due to gfx.font_rendering.fallback.always_use_cmaps
is irrelevant.
Firefox misinterprets keypresses
Multiple prefix keys are not supported.
AltGr-0
andShift-AltGr-0
are recognized as a character-generating keypress (good!), but the character they produce bears little relationship to what keyboard produces. (In our examples, the character may be available only via multiple prefix keys!)After a prefix key,
Control-(Shift-)letter
is not recognized as a character-generating key.Kana-Enter
is not recognized as a character-generating key.Alt-+-HEXDIGITS
is not recognized as a character-generating key sequence (recall thatAlt
should be pressed all the time, and other keys+ HEXDIGITS
should be pressed+released sequentially).When keyboard has an “extra” modifier key in addition to
Shift/Alt/Ctrl
(an analogue ofKana
key), combining it withCtrl
or withAlt
is interpreted by Firefox as if onlyCtrl
orAlt
were pressed.When keyboard generates different characters on
AltGr
than onControl-Alt
(possible with assigning extra modifier bits toAltGr
), FireFox interprets anyAltGr-Key
as if it wereControl-Alt-Key
.Exception:
whenAltGr-Fkey
produces a character, this character is understood correctly by FF. Same forAltGr-arrowKey
(but again, while this works on numeric keypad, it is still buggy ifNumLock
is on, or if the key isNumpad-Enter
.)The keyboard may have
rCtrl
which produces the same characters aslCtrl
, but which behaves differently when combined with other keys. FireFox ignores these differences.This is combinable with other remarks above: e.g.,
rCtrl-Kana
is interpreted by FireFox aslCtrl
.In addition to this, Firefox replaces
rCtrl
andlCtrl
modifiers by an impossible modifier: Firefox pretends that onlyunhandedCtrl
is down. (HereunhandedCtrl
is a fake keyVK_CONTROL
which Window pretends is down when either one ofrCtrl
orlCtrl
is down.) Since the situation whenunhandedCtrl
is down, but neitherrCtrl
norlCtrl
are down is not possible, this may access parts of the keyboard layout not visible to other applications. (Same forlAlt
andrAlt
.)The net effect is that key combinations involving
Ctrl
orAlt
keys may behave wrong in Firefox. For example, with version0.63
of izKeys keyboard layout,Ctrl
andAlt
are ignored on character-producing keys.If
lCtrl-lAlt-comma
produces—
(this isU+200A U+2014 U+200A
), andAltGr-comma
produces the “cedilla deadkey”, then pressingAltGr-comma c
acts as both: firstU+200A U+2014 U+200A
are inserted, thenç
.A subtle variation of the previous failure mode: If
lCtrl-lAlt-`
produces deadkey X, andAltGr-`
produces the deadkey Y, then combiningAltGr-`
witha
gives the expected Y*a combination. However, if combining with something more complicated (Control-Alt-a
orKana-f
), with what deadkey Y is not combinable, THEN the bugs strike:in the first case the deadkey behaves as X: it produces a pair of characters
Xα
; hereControl-Alt-a
producesα
. (Keep in mind that inserting two characters is the expected behaviour outside of Firefox, but Firefox usually “eats” an undefined deadkey combination; and note that it is X, not the expected Y!).in the second case it produces only the character
ф
generated byKana-f
. Here the behaviour is neither as outside Firefox (where it would produceYф
) nor as usual in Firefox (where it would eat the undefined sequence).
Of these problems, Chrome
has only Control-(Shift-)letter
one, but a very cursory inspection shows other problems: Kana-arrows
are not recognized as character-generating keys. (And IE9 just crashes in most of these situations…)
AltGr
-keypresses triggering some actions
For example, newer versions of windows have graphics driver reacting on Ctrl-Alt-Arrow
s by rotating the screen. Usually, when you know which application is stealing your keypresses, one can find a way to disable or reconfigure this action.
For screen rotation: Right-Click on desktop, “Graphics Options”, “Hot Keys”, disable. The way to reconfigure this is to use “Graphics Properties” instead of “Graphics Options” (but this may depend on your graphics subsystem).
AltGr
-keypresses going nowhere
Some AltGr
-keypresses do not result in the corresponding letter on keyboard being inserted. It looks like they are stolen by some system-wide hotkeys. See:
http://www.kbdedit.com/manual/ex13_replacing_altgr_with_kana.html
If these keypresses would perform some action, one might be able to deduce how to disable the hotkeys. So the real problem comes when the keypress is silently dropped.
I found out one scenario how this might happen, and how to fix this particular situation. (Unfortunately, it did not fix what I see, when AltGr-s
[but not AltGr-S
] is stolen.) Installing a shortcut, one can associate a hotkey to the shortcut. Unfortunately, the UI allows (and encourages!) hotkeys of the form <Control-Alt-letter> (which are equivalent to AltGr-letter
) - instead of safe combinations like Control-Alt-F4
or Alt-Shift-letter
(which — by convention — are ignored by keyboard drivers, and do not generate characters). If/when an application linked to by this shortcut is gone, the hotkey remains, but now it does nothing (no warning or dialogue comes).
If the shortcut is installed in one of "standard places", one can find it. Save this to K:\findhotkey.vbs (replace K: by the suitable drive letter here and below)
on error resume next
set WshShell = WScript.CreateObject("WScript.Shell")
Dim A
Dim Ag
Set Ag=Wscript.Arguments
If Ag.Count > 0 then
For x = 0 to Ag.Count -1
A = A & Ag(x)
Next
End If
Set FSO = CreateObject("Scripting.FileSystemObject")
f=FSO.GetFile(A)
set lnk = WshShell.CreateShortcut(A)
If lnk.hotkey <> "" then
msgbox A & vbcrlf & lnk.hotkey
End If
Save this to K:\findhotkey.cmd
set findhotkey=k:\findhotkey
for /r %%A in (*.lnk) do %findhotkey%.vbs "%%A"
for /r %%A in (*.pif) do %findhotkey%.vbs "%%A"
for /r %%A in (*.url) do %findhotkey%.vbs "%%A"
cd /d %UserProfile%\desktop
for /r %%A in (*.lnk) do %findhotkey%.vbs "%%A"
for /r %%A in (*.pif) do %findhotkey%.vbs "%%A"
for /r %%A in (*.url) do %findhotkey%.vbs "%%A"
cd /d %AllUsersProfile%\desktop
for /r %%A in (*.lnk) do %findhotkey%.vbs "%%A"
for /r %%A in (*.pif) do %findhotkey%.vbs "%%A"
for /r %%A in (*.url) do %findhotkey%.vbs "%%A"
cd /d %UserProfile%\Start Menu
for /r %%A in (*.lnk) do %findhotkey%.vbs "%%A"
for /r %%A in (*.pif) do %findhotkey%.vbs "%%A"
for /r %%A in (*.url) do %findhotkey%.vbs "%%A"
cd /d %AllUsersProfile%\Start Menu
for /r %%A in (*.lnk) do %findhotkey%.vbs "%%A"
for /r %%A in (*.pif) do %findhotkey%.vbs "%%A"
for /r %%A in (*.url) do %findhotkey%.vbs "%%A"
cd /d %APPDATA%
for /r %%A in (*.lnk) do %findhotkey%.vbs "%%A"
for /r %%A in (*.pif) do %findhotkey%.vbs "%%A"
for /r %%A in (*.url) do %findhotkey%.vbs "%%A"
cd /d %HOMEDRIVE%%HOMEPATH%
for /r %%A in (*.lnk) do %findhotkey%.vbs "%%A"
for /r %%A in (*.pif) do %findhotkey%.vbs "%%A"
for /r %%A in (*.url) do %findhotkey%.vbs "%%A"
(In most situations, only the section after the last cd /d
is important; in my configuration all the "interesting" stuff is in %APPDATA%
. Running this should find all shortcuts which define hot keys.
Run the cmd file. Repeat in the "All users"/"Public" directory. It should show a dialogue for every shortcut with a hotkey it finds. (But, as I said, it did not fix my problem: AltGr-s
works in MSKLC test window, and nowhere else I tried...)
Control-Shift
-keypresses starting bloatware applications
(Seen on IdeaPad.) Some pre-installed programs may steal Control-Shift
-keypresses; it may be hard to understand what is the name of the application even when the stealing results in user-visible changes.
One way to deal with it is to start Task Manager
in Processes
(or Details
) panel, and click on CPU column until one gets decreasing-order of CPU percentage. Then one can try to detect which process is becoming active by watching top rows when the action happens (or when one manages to get back to the desktop from the full-screen bloatware); one may need to repeat triggering this action several times in a row. After you know the name of executable, you can google to find out how to disable it, and/or whether it is safe to kill this process.
Example: On IdeaPad, it was TouchZone.exe (safe to kill). It was stealing Control-Shift-R
and Control-Shift-T
.
Example: On MSI, a similar stealer was MGSysCtrl.exe (some claim it is used to show on-screen animation when special laptop keys are pressed; if you do not need them, it is safe to kill). It was stealing Control-Alt-s
. (But to find this one, I needed to kill all suspicious apps one by one…)
WINDOWS GOTCHAS for keyboard developers using MSKLC
Several similar MSKLC created keyboards may confuse the system
Apparently, the system may get majorly confused when the description
of the project gets changed without changing the DLL (=project) name.
(Tested only with Win7 and the name in the DESCRIPTIONS section coinciding with the name on the KBD line - both in *.klc file.)
The symptoms: I know how one can get 4 different lists of keyboards:
Click on the keyboard icon in the
Language Bar
- usually shown on the toolbar; positioned to the right of the language code EN/RU etc (keyboard icon is not shown if only one keyboard is associated to the current language).-
Go to the
Input Language
settings (e.g., right-click on the Language bar, Settings, General. -
on this
General
page, pressAdd
button, go to the language in question. -
Check the .klc files for recently installed Input Languages.
-
In MS Keyboard Layout Creator, go to
File/Load Existing Keyboard
list.
It looks like the first 4 get in sync if one deletes all related keyboards, then installs the necessary subset. I do not know how to fix 5 - MSKLC continues to show the old name for this project.
Another symptom: Current language indicator (like EN
) on the language bar disappears. (Reboot time?)
Is it related to ***\Local Settings\MuiCache\***
hive???
Possible workaround: manually remove the entry in HKEY_LOCAL_MACHINE\SYSTEM\ControlSet001\Control\Keyboard Layouts
(the last 4 digits match the codepage in the .klc file).
Too long description (or funny characters in description?)
If the name in the DESCRIPTIONS
section is too long, the name shown in the list 2
above may be empty.
(Checked only on Win7 and when the name in the DESCRIPTIONS section coincides with the name on the KBD
line - both in *.klc file. Length=63 works fine, Length=64 triggers the bug.)
(Fixed by shortening the name [but see "Several similar MSKLC created keyboards may confuse the system" above!], so maybe it was not the length but some particular character (+
?) which was confusing the system. (I saw a report on MSKLC bug when description had apostroph character '
.)
MSKLC ruins names of dead key when reading a .klc
When reading a .klc file, MS Keyboard Layout Creator may ruin the names of dead keys. Symptom: open the dialogue for a dead key mapping (click the key, check that Dead key view
has checkmark, click on the ...
button near the Dead key?
checkbox); then the name (the first entry field) contains some junk. (Looks like a long ASCII string
U+0030 U+0030 U+0061 U+0039
.)
Workaround: if all one needs is to compile a .klc, one can run KBDUTOOL directly.
Workaround: correct ALL these names manually in MSKLC. If the names are the Unicode name for the dead character, just click the Default
button near the entry field. Do this for ALL the dead keys in all the registers (including SPACE
!). If CapsLock
is not made "semantically meaningful", there are 6 views of the keyboard (PLAIN, Ctrl, Ctrl+Shift, Shift, AltGr, AltGr+Shift
) - check them all for grayed out keys (=deadkeys).
Check for success: File/"Save Source File As
, use a temporary name. Inspect near the end of the generated .klc file. If OK, you can go to the Project/Build menu. (Likewise, this way lets you find which deadkey's names need to be fixed.)
!!! This is time-consuming !!! Make sure that other things are OK before you do this (by Project/Validate
, Project/Test
).
BTW: It might be that this is cosmetic only. I do not know any bad effect - but I did not try to use any tool with visual feedback on the currently active sub-layout of keyboard.
Double bug in KBDUTOOL with dead characters above 0x0fff
This line in .klc file is treated correctly by MSKLC's builtin keyboard tester:
39 SPACE 0 0020 00a0@ 0020 2009@ 200a@ // , , , , // SPACE, NO-BREAK SPACE, SPACE, THIN SPACE, HAIR SPACE
However, via kbdutool it produces the following two bugs:
static ALLOC_SECTION_LDATA MODIFIERS CharModifiers = {
&aVkToBits[0],
7,
{
// Modification# // Keys Pressed
// ============= // =============
0, //
1, // Shift
2, // Control
SHFT_INVALID, // Shift + Control
SHFT_INVALID, // Menu
SHFT_INVALID, // Shift + Menu
3, // Control + Menu
4 // Shift + Control + Menu
}
};
.....................................
{VK_SPACE ,0 ,' ' ,WCH_DEAD ,' ' ,WCH_LGTR ,WCH_LGTR },
{0xff ,0 ,WCH_NONE ,0x00a0 ,WCH_NONE ,WCH_NONE ,WCH_NONE },
.....................................
static ALLOC_SECTION_LDATA LIGATURE2 aLigature[] = {
{VK_SPACE ,6 ,0x2009 ,0x2009 },
{VK_SPACE ,7 ,0x200a ,0x200a },
Essentially, 2009@ 200a@
produce LIGATURES
(= multiple 16-bit chars) instead of deadkeys. Moreover, these ligatures are put on non-existing "modifications" 6, 7 (the maximal modification defined is 4; so the code uses the Shift + Control + Menu
flags instead of "modification number" in the ligatures table.
MSKLC keyboards handle Ctrl-Shift-letter
, Ctrl-@ (x00)
, Ctrl-^ (x1e)
and Ctrl-_ (x1f)
differently than US keyboard
The US keyboard produces (as the “string value”) the corresponding Control-letter when Ctrl-Shift-letter
is pressed. (In console applications, \x00
is not visible.) MSKLC does not reproduces this behaviour. This may break an application if it was not specifically tested with “complicated” keyboards.
The only way to fix this from the “naive” keyboard layout DLL (i.e., the kind that MSKLC generates) which I found is to explicitly include Ctrl-Shift
as a handled combination, and return Ctrl-letter
on such keypresses. (This is enabled in the generated keyboards generated by this module - not customizable in v0.12.)
"There was a problem loading the file" from MSKLC
Make line endings in .klc DOSish.
AltGr-keys
do not work
Make line endings in .klc DOSish (when given as input to kbdutool - it gives no error messages, and deadkeys work [?!]).
Error 2011 (ooo-us, line 33): There are not enough columns in the layout list.
The maximal line end of kbdutool is exceeded (a line or two ahead). Try remoing inline comments. If helps, change he workflow to cut off long lines (250 bytes is OK).
Error 2012 (ooo-us-shorten.klc, line 115):
<ScanCode e065 - too many scancodes here to parse.>
from MSKLC. This means that the internal table of virtual keys mapped to non-e0
(sic!) scancodes is overloaded.
Time to switch to direct generation of .c file? Or you need to triage the “added” virtual keys, and decide which are less important so you can delete them from the .klc file.
Only the first 8 with-modifiers columns are processed by kbdutool
Time to switch to direct generation of .c file?
Only the first digit of the which-modifier-column is output by kbdutool in LIGATURES
Time to switch to direct generation of .c file?
kbdutool produces KEYNAME_DEAD
section with meaningless entries for prefix keys 0x08
, 0x0A
, 0x0D
These entries do not stop keyboard from working. They look like L"'\b'" L"Name is here…"
...
Time to switch to direct generation of .c file?
It is not clear how to compile .C files emitted by kbdutool.exe
This distribution includes a script examples/compile_link_kbd.cmd which can do this. It is inspired by
http://stackoverflow.com/questions/3360746/how-can-i-compile-programmer-dvorak
http://levicki.net/articles/tips/2006/09/29/HOWTO_Build_keyboard_layouts_for_Windows_x64.php
It allows us to build using the cycle
Build skeleton .klc file.
Convert to C using kbdutool.c.
Patch against bugs in kbdutool.c.
Patch in features not supported by kbdutool.c.
Compile and link DLLs.
(This assumes that the installer was already built by MSKLC using a “simplified-to-nothing” .klc file which does not trigger the MSKLC bugs).
(See also http://accentuez.mon.nom.free.fr/Clavier-Galeron_fichiers/cr%E9ation_clavier.zip.)
kbdutool cannot ignore column=15 of the keybinding definition table
(Compare with "Windows ignores column=15 of the keybinding definition table".)
kbdutool requires that all the columns are associated to a modifier-bitmap. But column=15 should not be associated to any.
The workaround is to associate it to the bitmap which should not be bound to any column (like 4=KBDALT
). In the output .C
file, one would have 15 instead of SHFT_INVALID
for the bitmap 4, but SHFT_INVALID
is defined to be 15 anyway…
kbdutool ignores bits above 0x20 in the modification columns descriptor
Time to switch to direct generation of .C files?
kbdutool cannot assign more than one bitmask to a modification column
Time to switch to direct generation of .C files?
(Quite often, one combination of modifiers should produce the same characters as another one. The format of keyboard layout tables allows them to share a modification column. The format of .klc files does not allow sharing.)
kbdutool forgets to emit aVkToWch3
/6/8
If the .klc file has many modification columns, the emitted aVkToWcharTable contains only aVkToWch1
/2.
kbdutool confuses LIGATURES on unusual keys
For example, VK_SUBTRACT
may be replaced by VK_F2
in the LIGATURES table.
Time to switch to direct generation of .C files?
kbdutool places KbdTables
at end of the generated .c file
The offset of this structure should be no more than 0x10000. Thus keyboards with large tables of prefixed keys may fail to load. This may be related to the bug "If data in KEYNAME_DEAD
takes too much space, keyboard is mis-installed, and “Language Bar” goes crazy".
Time to switch to direct generation of .C files?
Error "the required resource DATABASE is missing" from setup.exe
The localized DESCRIPTION
in .klc file contains a character outside of the repertoir of the codepage in question. Removing offending characters, or removing the DESCRIPTION
altogether should fix this. (But either way, the name of layout in the Settings
of the Language Bar may become empty.) Having a different localized description has a side effect that the name of the layout shown in the Language Bar popups is localized.
(The localized description is what put into the resource=1000
of the DLL file; it is this resource which is mentioned in the registry. (There will be no such resource when the localized DESCRIPTION
is missing.)
(The failure of setup.exe is not reproducible after a reboot!)
Apparently, this has nothing to do with the length, so the (older) conjectures below are wrong (although the .RC file generated by MSKLC has the [non-localized] name truncated after 40 chars in the field FileDescription
— but not in other fields):
It looks like there is a buffer overflow in MSKLC, and sometimes the generated setup.exe in the install package would just exit with this error. The apparent reason is the length of the DESCRIPTION
-like fields.
Workaround: it looks like the DESCRIPTION
field is not used in setup.exe. So generate an “extra dummied” .klc file too (with shortened descriptions), make an install package from it, and mix the setup.exe from the “extra dummied” variant with the rest of the install package from a “less dummied” .klc file.
The alternative is to get rid of setup.exe completely, and ask users to run the appropriate .msi file from the install package by hand (choosing basing on 32-bit vs 64-bit architecture).
Summary of the productive workflow with .klc:
If direct generation of .C files is out of question, the following workflow may be used (some of these steps may be omitted depending on how complicated your .klc layout is; for practical implementation, see the example of .klc creation and the example of .klc to .dll processing):
Make an “extra dummied” .klc (short descriptions, short dummy
SHIFTSTATE
,LAYOUT
,DEADKEY
,KEYNAME_DEAD
sections, noLIGATURE
section). Run it through GUI MSKLC (Alt-P Enter
, thenAlt-P B Enter Enter
,Alt-F4
). Store the generated setup.exe, rename the directory.Make a “less dummied” .klc file (as above, but with the correct description). Do as above, and mix in the setup.exe from the previous step.
Run the “real” .klc file through the kbdutool CLI. Fix errors in the generated .C and .H files (using scripts and patches if needed).
(One may need to remove a few lines in the
LAYOUT
section to avoid buffer overflows too.)Compile the fixed .C files. (One may need to split them in two to decrease the offset of the static table in the DLL to the level Windows can handle: less than 64K.) Mix the generated .dll files with the install package made above.
WINDOWS GOTCHAS for keyboard developers (problems in kernel)
It is hard to understand what a keyboard really does
To inspect the output of the keyboard in the console mode (may be 8-bit, depending on how Perl is compiled), one can run
perl -MWin32::Console -wle 0 || cpan install Win32::Console
perl -we "sub mode2s($){my $in = shift; my @o; $in & (1<<$_) and push @o, (qw(rAlt lAlt rCtrl lCtrl Shft NumL ScrL CapL Enh ? ??))[$_] for 0..10; qq(@o)} use Win32::Console; my $c = Win32::Console->new( STD_INPUT_HANDLE); my @k = qw(T down rep vkey vscan ch ctrl); for (1..20) {my @in = $c->Input; print qq($k[$_]=), ($in[$_] < 0 ? $in[$_] + 256 : $in[$_]), q(; ) for 0..$#in; print(@in ? mode2s $in[-1] : q(empty)); print qq(\n)}"
This installs Win32::Console module (if needed; included with ActiveState Perl) then reports 20 following console events (press and keep Alt
key to exit by generating a “harmless” chain of events). Limitations: the reported input character is not processed (via ToUnicode(); hence chained keys and multiple chars per key are reported only as low-level), and is reported as a signed 8-bit integer (so the report for above-8bit characters is completely meaningless).
T=1; down=1; rep=1; vkey=65; vscan=30; ch=240; ctrl=9; rAlt lCtrl
T=1; down=0; rep=1; vkey=65; vscan=30; ch=240; ctrl=9; rAlt lCtrl
This reports single (T=1) events for keypress/keyrelease (down=1/0) of AltGr-a
. One can see that AltGr
generates rAlt lCtrl
modifiers (this is just a transcription of ctrl=9
, that a
is on virtual key 65 (this is VK_A
) with virtual scancode 30, and that the generated character (it was æ
) is 240
.
The character is approximated to the current codepage. For example, this is Kana-b
entering β = U+03b2
in codepage cp1252
:
T=1; down=1; rep=1; vkey=66; vscan=48; ch=223; ctrl=0;
T=1; down=0; rep=1; vkey=66; vscan=48; ch=223; ctrl=0;
Note that 223 = 0xDF
, and U+00DF = ß
. So beta is substituted by eszet.
There is also a script examples/raw_keys_via_api.pl in this distribution which does a little bit more than this. One can also give this script the argument U
(or Un
, where n
is the 0-based number among the listed keyboard layouts) to report ToUnicode() results, or argument cooked
to report what is produced by reading raw charactes (as opposed to events) from the console.
It is not documented how to make a with-prefix-key(s) combination produce 0-length string
Use 0000@
(in .klc), or DEADKEY 0 in a .c file. Explanation: what a prefix key is doing is making the kernel remember a word (the state of the finite automaton), and not producing any output character. Having no prefix key corresponds to the state being 0.
Hence makeing prefix_key=0 is the same as switching the finite automaton to the initial state, and not producing any character — and this exactly what is requested in the question.
If data in KEYNAME_DEAD
takes too much space, keyboard is mis-installed, and “Language Bar” goes crazy
Installation reports success, the keyboard appears in the list in the Language Bar's "Settings". But the keyboard is not listed in the menu of the Language Bar itself. (This is not fixed by a reboot.)
Deinstalling (by MSKLC's installer) in such a case removes one (apparently, the last) of the listed keyboards for the language; at least it is removed from the menu of the Language Bar itself. However, the list in the “Settings” does not change! One can't restore the (wrongly) removed (unrelated!) layout by manipulating the latter list. (I did not try to check what will happen if only one keyboard for the language is available — is it removed for good?) This condition is fixed by a reboot: the “missing” “unrelated” layout jumps to existence.
I did not find a way to restore the deleted keyboard layout (without a reboot). Experimenting with these is kinda painful: with each failure, I add one extra keyboard to the list in the “Settings”; - so the list is growing and growing! [Better add useless-to-you keyboards, since until the reboot you will never be able to install them again.]
Update: this condition reappeared in update from v0.61 to v0.63 of izKeys layouts. Between these versions, there was a very small increment of the size: one modification column was added, and two deadkeys were added. Removing a bunch of (useless?) dead keys descriptions fixed this again; but now I have my doubts on whether it was due to ONLY increasing the size of KEYNAME_DEAD
… Maybe it is due to the total size of certain segments in the DLL.
(This may be related to the bug "kbdutool places KbdTables
at end of the generated .c file".)
Windows ignores column=15 of the keybinding definition table
Note that 15 is SHFT_INVALID
; this column number is used to indicate that this particular combination of modifiers does not produce keys. In particular, the generator must avoid this column number.
Workaround: put junk into this column, and use different columns for useful modifier combinations. The mapping from modifiers to columns should not be necessarily 1-to-1. (But see "kbdutool cannot ignore column=15 of the keybinding definition table".)
Windows combines modifier bitmaps for lCtrl
, Alt
and rAlt
on AltGr
(At least when AltGr
is special in the keyboard,) the modifier bitmap bound to this key is actually bit-or of bitmaps above. Essentially, this prohibits assigning interesting flag combinations to lCtrl
.
The (very limited) workaround is to ensure that the flags one puts on AltGr
contain all the flags assigned to the above VK codes. (This does not change anything, but at least makes the assignments less confusing for human inspection.)
Windows ignores lAlt
if its modifier bitmaps is not standard
Adding KBDROYA
to lAlt
disables console sending non-modified char on keydown. Together with the previous problem, this looks like essentially prohibiting putting interesting bitmaps on the left modifier keys.
Workaround: one can add KBDKANA
on lAlt
. It looks like the combination KBDALT|KBDKANA
is compatible with Windows' handling of Alt
(both in console, and for accessing/highlighting the menu entries). (However, since only KBDALT
is going to be stripped for handling of lAlt-key
, the modification column for KBDKANA
should duplicate the modification column for no-KBD
-flags. Same with KBDSHIFT
added.)
When AltGr
produces ROYA
, problems in Notepad
Going to the Save As dialogue in Notepad loses "speciality of AltGr" (it highlights Menu); one need to switch layouts via LAlt+LShift to restore.
I do not know any workaround.
Console applications cannot detect when a keypress may be interpreted as a “command”
The typical logic of an (advanced) application is that it interprets certain keypresses (combinations of keys with modifiers) as “commands”. To do this in presence of user-switchable keyboards, when it is not known in compile time which key sequences generate characters, the application must be able to find at runtime which keypresses are characters-generating, and which are not. The latter keypresses are candidates to be checked whether they should trigger commands of the application.
For final keypresses of a character-generating key-sequence, an application gets a notification from the ReadConsoleEvent() API call that this keypress generates a character. However, for the keypresses of the sequence which are non the last one (“dead” keys), there is no such notification.
Therefore, there is no way to avoid dead keys triggering actions in an application. What is the difference with non-console applications? First of all, they get such a notification (with the standard TranslateMessage()/DispatchMessage() sequence of API calls, on WM_KEYDOWN, one can PeekMessage() for WM_SYSDEADCHAR/WM_DEADCHAR and/or WM_SYSCHAR/WM_CHAR). Second, the windowed application may call ToUnicode(Ex)() to calculate this information itself.
Well, why a console application cannot use the second method? First, the active keyboard layout of a console application is the default one. When user switches the keyboard layout of the console, the application gets no notification of this, and its keyboard layout does not change. This makes ToUnicode() useless. Moreover, due to security architecture, the console application cannot query the ID of the thread serving the message loop of the console, so cannot query GetKeyboardLayout() of this thread. Hence ToUnicodeEx() is useless too.
(There may be a lousy workaround: run ToUnicodeEx() on all the installed keyboard layouts, and check which of them are excluded by comparing with results of ReadConsoleEvent(). Interpret contradictions as user changing the keyboard layout. Of course, on several keypresses following a change of keyboard layout one may get unexpected results. And if two similar keyboards are installed, one may also never get definite answer on which of them is currently active.)
(To handle this workaround, one must have a way to call ToUnicode() in a way which does not change the internal state of the keyboard driver. Observe:
Such a way is not documented.
Watch the character reported by ReadConsoleEvent() on the
KEYUP
event for deadkeys. This is the character which a deadkey would produce if it is pressed twice (and is 0 if pressing it twice results in a deadkey again). The only explanation for this I can fathom is that the console's message queue thread calls such a non-disturbing-state version of ToUnicode().Why it should be “non-disturbing”? Otherwise it would reset the state “this deadkey was pressed”, and the following keypress would be interpreted as not preceded by a deadkey. And this is not what happens. (If one does it with usual ToUnicode() call, DOWN reports a deadkey, but UP reports “ignored”; to see this, run examples/raw_keys_via_api.pl with arguments
Un 1
with a keyboard which produces ç onAltGr-, c
. Heren
is the number of the keyboard in the list of available keyboards reported byexamples/raw_keys_via_api.pl U 1
).Well, when one knows that some API calls are possible, it is just a SMP to find it out (see examples/raw_keys_via_api.pl). It turns out that given argument
wFlags=0x02
achieves the behaviour of a console during KeyUp event. (As a side benefit, it also avoids another glitch in Windows' keyboard processing: it reports the character value in presence ofAlt
modifier — recall that ToUnicodeEx() ignoresAlt
unlessCtrl
is present too. Well, I checked this so far only on KeyUp event, where console producess mysterious results.)However, even without using undocumented flags, it is not hard to construct such a non-disturbing version of ToUnicode(). The only ingredient needed is a way to reset the state to “no deadkeys pressed” one. Then just store keypresses/releases from the time the last such state was found, call ToUnicode(), reset state, and call ToUnicode() again for all the stored keypresses/releases; then update the stored state appropriately.
But I strongly doubt that console's message loop does anything so advanced. My bet would be that it uses a non-documented call or non-documented flags. (Especially since the approach above does not handle
Alt
the same way as the console does.)
Behaviour of Alt-Modifiers-Key
vs Modifiers-Key
When both combinations produce characters (say, X and Y), it is not clear how an application shouild decide whether it got Alt-Y
event (for menu entry starting with Y), or an X
event.
A partial workaround (if the semantic of the layout fits into the limited number of bits in the ORed mask): make all the keys which may be combined with Alt
to have the KBDCTRL
bit in the mask set; add some extra bit to Ctrl
keys to be able to distinguish them. Then at least the kernel will produce the correct character on the ToUnicode() call (hence in TranslateMessage()). [A potential that an application may be confused is still large.]
Customization of what CapsLock
is doing is very limited
(See the description of the semantic of CapsLock
in "Keyboard input on Windows, Part II: The semantic of ToUnicode()".)
A partial workaround (if the semantic of the layout fits into the limited number of bits in the ORed mask): make all the modifier combinations (except for the base layer) to have KBDCTRL
and KBDALT
bits set; add some extra bits to Ctrl
keys and Alt
keys (apparently, only KBDKANA
will work with Alt
) to be able to distinguish them. Then the CAPLOKALTGR
flag will affect all these combinations too.
lCtrl-rCtrl
combination: multiple problems
First of all, sometimes Shift
is ignored when used with this combination. (Fixed by reboot. When this happens, Shift
does not work also with combinations with lAlt
and/or Menu
). On the other hand, CapsLock
works as intended. (I even got an impression that sometimes Shift
works when CapsLock
is active; cannot reproduce this, though.)
I suspect this is related to the binding of Shift-Ctrl
to switch between keyboards of a language suddently jumpting to existence (without my interaction). Simultaneously, this option disappeared from the UI to change keyboard options ("Settings/Advanced Key Settings" in Language Bar in Windows 7). I might be that press/release of Shift
is filtered out in presence of lCtrl-rCtrl
?
(I also saw what looks like Menu
key being stuck in some situations — fixed by pressing it again. Do not know how to reproduce this. It is interesting to note that one of the bits in the mask of the Menu
key is 0x80, and there is a define for this bit in kbd.h named KBDGRPSELTAP
— but it is undocumented, and, judging by names, one might think that KBDGRPSELTAP
would work in pair with the flag GRPSELTAP
of VK_TO_WCHARSn-
Attributes>.)
NOTES: Apparently, key up/down for many combinations of lCtrl+rCtrl+char
are not delivered to applications. Key up/down for `/5/6/-/=/Z/X/C/V/M/,/./Enter/rShift
are not delivered here when used with lCtrl+rCtrl
modifiers (at least in a console). Adding Shift/lAlt/Menu
does not change this. Same for F1/F2/F8/F9
and Enter/Insert/Delete/Home/PgUp
(but not for keypad ones!).
Moreover, when used with KeyPad→
or KeyPad*
, this behaves as if both these keys were pressed. Same with the pair KeyPad-
and Keypad+
(is it hardware-dependent???).
(Time to time lCtrl+rCtrl+NUMPADchar
do not work — neither with nor without NumLock
.)
No workarounds are known.
lAlt-rAlt
combination: many keys are not delivered to applications
Apparently, key up/down for many combinations of lAlt+rAlt+char
are not delivered to applications. For example, Numpad3
and Numpad7
— neither with nor without NumLock
; same for G/H/'/B/N/slash
(at least in a console). Adding Shift/lAlt/Menu
does not change this. Same for F4/F5/F6
.
No workarounds are known (except that Numpad3
and Numpad7
(without NumLock
) may be replaced by Home
and PgDown
).
NOTE: in the bottom row of the keyboard, all the keys (except lShift
) are either in the list above, or in the list for lCtrl+rCtrl
modifiers.
Too long DESCRIPTION
of the layout is not shown in Language Bar Settings
(the description is shown in the Language Bar itself). The examples are (behave the same)
Greek-QWERTY (Pltn) Grn=⇑␣=^ˡⒶˡ-=Lat; Ripe=Ⓐʳ␣=Mnu-=Rus(Phon); Ripe²=Mnu-^ʳ-=Hbr; k.ilyaz.org
US-Intl Grn=⇑␣=^ˡⒶˡ-=Grk; Ripe=Ⓐʳ␣=Mnu-=Rus(Phon); Ripe²=Mnu-^ʳ-=Hbr; k.ilyaz.org
(Or maybe it is the semicolons in the names???). If this happens, one can still assign distinctive icons to the layout, and distinguish them via going to Properties
.
UNICODE TABLE GOTCHAS
The position of Unicode consortium is, apparently, that the “name” of a Unicode character is “just an identifier”. In other words, its (primary) function is to identify a character uniquely: different characters should have different names, and that's it. Any other function is secondary, and “if it works, fine”; if it does not work, tough luck. If the name does not match how people use the character (and with the giant pool of defined characters, this has happened a few times), this is not a reason to abandon the name.
This position makes the practice of maintaining backward compatibility easy. There is documentation of obvious errors in the naming.
However, this module tries to extract a certain amount of orthogonality from the giant heap of characters defined in Unicode; the principal concept is “a mutator”. Most mutators are defined by programmatic inspection of names of characters and relations between names of different characters. (In other words, we base such mutators on names, not glyphs.) Here we sketch the irregularities uncovered during this process.
APL symbols with UP TACK
and DOWN TACK
look reverted w.r.t. other UP TACK
and DOWN TACK
symbols.
LESS-THAN
, FULL MOON
, GREATER-THAN
, EQUALS
GREEK RHO
, MALE
are defined with SYMBOL
or SIGN
at end, but (may) drop it when combined with modifiers via WITH
. Likewise for SUBSET OF
, SUPERSET OF
, CONTAINS AS MEMBER
, PARALLEL TO
, EQUIVALENT TO
, IDENTICAL TO
.
Sometimes opposite happens, and SIGN
appears out of blue sky; compare:
2A18 INTEGRAL WITH TIMES SIGN
2A19 INTEGRAL WITH INTERSECTION
ENG
is a combination of n
with HOOK
, but it is not marked as such in its name.
Sometimes a name of diacritic (after WITH
) acquires an ACCENT
at end (see U+0476
).
Oftentimes the part to the left of WITH
is not resolvable: sometimes it is underspecified (e.g, just TRIANGLE
), sometimes it is overspecified (e.g., in LEFT VERTICAL BAR WITH QUILL
), sometime it should be understood as a glyph-of-written-word (e.g, in END WITH LEFTWARDS ARROW ABOVE
). Sometimes it just does not exist (e.g., LATIN LETTER REVERSED GLOTTAL STOP WITH STROKE
- there is LATIN LETTER INVERTED GLOTTAL STOP
, but not the reversed variant). Sometimes it is a defined synonym (VERTICAL BAR
).
Sometimes it has something appended (N-ARY UNION OPERATOR WITH DOT
).
Sometimes WITH
is just a clarification (RIGHTWARDS HARPOON WITH BARB DOWNWARDS
).
1 AND
1 ANTENNA
1 ARABIC MATHEMATICAL OPERATOR HAH
1 ARABIC MATHEMATICAL OPERATOR MEEM
1 ARABIC ROUNDED HIGH STOP
1 ARABIC SMALL HIGH LIGATURE ALEF
1 ARABIC SMALL HIGH LIGATURE QAF
1 ARABIC SMALL HIGH LIGATURE SAD
1 BACK
1 BLACK SUN
1 BRIDE
1 BROKEN CIRCLE
1 CIRCLED HORIZONTAL BAR
1 CIRCLED MULTIPLICATION SIGN
1 CLOSED INTERSECTION
1 CLOSED LOCK
1 COMBINING LEFTWARDS HARPOON
1 COMBINING RIGHTWARDS HARPOON
1 CONGRUENT
1 COUPLE
1 DIAMOND SHAPE
1 END
1 EQUIVALENT
1 FISH CAKE
1 FROWNING FACE
1 GLOBE
1 GRINNING CAT FACE
1 HEAVY OVAL
1 HELMET
1 HORIZONTAL MALE
1 IDENTICAL
1 INFINITY NEGATED
1 INTEGRAL AVERAGE
1 INTERSECTION BESIDE AND JOINED
1 KISSING CAT FACE
1 LATIN CAPITAL LETTER REVERSED C
1 LATIN CAPITAL LETTER SMALL Q
1 LATIN LETTER REVERSED GLOTTAL STOP
1 LATIN LETTER TWO
1 LATIN SMALL CAPITAL LETTER I
1 LATIN SMALL CAPITAL LETTER U
1 LATIN SMALL LETTER LAMBDA
1 LATIN SMALL LETTER REVERSED R
1 LATIN SMALL LETTER TC DIGRAPH
1 LATIN SMALL LETTER TH
1 LEFT VERTICAL BAR
1 LOWER RIGHT CORNER
1 MEASURED RIGHT ANGLE
1 MONEY
1 MUSICAL SYMBOL
1 NIGHT
1 NOTCHED LEFT SEMICIRCLE
1 ON
1 OR
1 PAGE
1 RIGHT ANGLE VARIANT
1 RIGHT DOUBLE ARROW
1 RIGHT VERTICAL BAR
1 RUNNING SHIRT
1 SEMIDIRECT PRODUCT
1 SIX POINTED STAR
1 SMALL VEE
1 SOON
1 SQUARED UP
1 SUMMATION
1 SUPERSET BESIDE AND JOINED BY DASH
1 TOP
1 TOP ARC CLOCKWISE ARROW
1 TRIPLE VERTICAL BAR
1 UNION BESIDE AND JOINED
1 UPPER LEFT CORNER
1 VERTICAL BAR
1 VERTICAL MALE
1 WHITE SUN
2 CLOSED MAILBOX
2 CLOSED UNION
2 DENTISTRY SYMBOL LIGHT VERTICAL
2 DOWN-POINTING TRIANGLE
2 HEART
2 LEFT ARROW
2 LINE INTEGRATION
2 N-ARY UNION OPERATOR
2 OPEN MAILBOX
2 PARALLEL
2 RIGHT ARROW
2 SMALL CONTAINS
2 SMILING CAT FACE
2 TIMES
2 TRIPLE HORIZONTAL BAR
2 UP-POINTING TRIANGLE
2 VERTICAL KANA REPEAT
3 CHART
3 CONTAINS
3 TRIANGLE
4 BANKNOTE
4 DIAMOND
4 PERSON
5 LEFTWARDS TWO-HEADED ARROW
5 RIGHTWARDS TWO-HEADED ARROW
8 DOWNWARDS HARPOON
8 UPWARDS HARPOON
9 SMILING FACE
11 CIRCLE
11 FACE
11 LEFTWARDS HARPOON
11 RIGHTWARDS HARPOON
15 SQUARE
perl -wlane "next unless /^Unresolved: <(.*?)>/; $s{$1}++; END{print qq($s{$_}\t$_) for keys %s}" oxx-us2 | sort -n > oxx-us2-sorted-kw
SQUARE WITH
specify fill - not combining. FACE
is not combining, same for HARPOON
s.
Only CIRCLE WITH HORIZONTAL BAR
is combining. Triangle is combining only with underbar and dot above.
TRIANGLE
means WHITE UP-POINTING TRIANGLE
. DIAMOND
- WHITE DIAMOND
(so do many others.) TIMES
means MULTIPLICATION SIGN
; but CIRCLED MULTIPLICATION SIGN
means CIRCLED TIMES
- go figure! CIRCLED HORIZONTAL BAR WITH NOTCH
is not a decomposition (it is "something circled").
Another way of compositing is OVER
(but not UNDER
!) and FROM BAR
. See also ABOVE
, BELOW
- but only BELOW LONG DASH
. Avoid WITH/AND
after these.
TWO HEADED
should replace TWO-HEADED
. LEFT ARROW
means LEFTWARDS ARROW
, same for RIGHT
. DIAMOND SHAPE
means DIAMOND
- actually just a bug - http://www.reddit.com/r/programming/comments/fv8ao/unicode_600_standard_published/? LINE INTEGRATION
means CONTOUR INTEGRAL
. INTEGRAL AVERAGE
means INTEGRAL
. SUMMATION
means N-ARY SUMMATION
. INFINITY NEGATED
means INFINITY
.
HEART
means WHITE HEART SUIT
. TRIPLE HORIZONTAL BAR
looks genuinely missing...
SEMIDIRECT PRODUCT
means one of two, left or right???
This better be convertible by rounding/sharpening mutators, but see BUT NOT/WITH NOT/OR NOT/AND SINGLE LINE NOT/ABOVE SINGLE LINE NOT/ABOVE NOT
2268 LESS-THAN BUT NOT EQUAL TO; 1.1
2269 GREATER-THAN BUT NOT EQUAL TO; 1.1
228A SUBSET OF WITH NOT EQUAL TO; 1.1
228B SUPERSET OF WITH NOT EQUAL TO; 1.1
@ Relations
22E4 SQUARE IMAGE OF OR NOT EQUAL TO; 1.1
22E5 SQUARE ORIGINAL OF OR NOT EQUAL TO; 1.1
@@ 2A00 Supplemental Mathematical Operators 2AFF
@ Relational operators
2A87 LESS-THAN AND SINGLE-LINE NOT EQUAL TO; 3.2
x (less-than but not equal to - 2268)
2A88 GREATER-THAN AND SINGLE-LINE NOT EQUAL TO; 3.2
x (greater-than but not equal to - 2269)
2AB1 PRECEDES ABOVE SINGLE-LINE NOT EQUAL TO; 3.2
2AB2 SUCCEEDS ABOVE SINGLE-LINE NOT EQUAL TO; 3.2
2AB5 PRECEDES ABOVE NOT EQUAL TO; 3.2
2AB6 SUCCEEDS ABOVE NOT EQUAL TO; 3.2
@ Subset and superset relations
2ACB SUBSET OF ABOVE NOT EQUAL TO; 3.2
2ACC SUPERSET OF ABOVE NOT EQUAL TO; 3.2
Looking into v6.1 reference PDFs, 2268,2269,2ab5,2ab6,2acb,2acc have two horizontal bars, 228A,228B,22e4,22e5,2a87,2a88,2ab1,2ab2 have one horizontal bar, Hence BUT NOT EQUAL TO
and ABOVE NOT EQUAL TO
are equivalent; so are WITH NOT EQUAL TO
, OR NOT EQUAL TO
, AND SINGLE-LINE NOT EQUAL TO
and ABOVE SINGLE-LINE NOT EQUAL TO
. (Square variants come only with one horizontal line?)
Set $ENV{UI_KEYBOARDLAYOUT_UNRESOLVED}
to enable warnings. Then do
perl -wlane "next unless /^Unresolved: <(.*?)>/; $s{$1}++; END{print qq($s{$_}\t$_) for keys %s}" oxx | sort -n > oxx-sorted-kw
SEE ALSO
The keyboard(s) generated with this module: UI::KeyboardLayout::izKeys, http://k.ilyaz.org/
On diacritics:
http://www.phon.ucl.ac.uk/home/wells/dia/diacritics-revised.htm#two
http://en.wikipedia.org/wiki/Tonos#Unicode
http://en.wikipedia.org/wiki/Early_Cyrillic_alphabet#Numerals.2C_diacritics_and_punctuation
http://en.wikipedia.org/wiki/Vietnamese_alphabet#Tone_marks
http://diacritics.typo.cz/
http://en.wikipedia.org/wiki/User:TEB728/temp (Chars of languages)
http://www.evertype.com/alphabets/index.html
Accents in different Languages:
http://fonty.pl/porady,12,inne_diakrytyki.htm#07
http://en.wikipedia.org/wiki/Latin-derived_alphabet
On typography marks
http://wiki.neo-layout.org/wiki/Striche
http://www.matthias-kammerer.de/SonsTypo3.htm
http://en.wikipedia.org/wiki/Soft_hyphen
http://en.wikipedia.org/wiki/Dash
http://en.wikipedia.org/wiki/Ditto_mark
On keyboard layouts:
http://en.wikipedia.org/wiki/Keyboard_layout
http://en.wikipedia.org/wiki/Keyboard_layout#US-International
http://en.wikipedia.org/wiki/ISO/IEC_9995
http://www.pentzlin.com/info2-9995-3-V3.pdf (used almost nowhere - only half of keys in Canadian multilanguage match)
http://en.wikipedia.org/wiki/QWERTY#Canadian_Multilingual_Standard
http://en.wikipedia.org/wiki/Unicode_input
Discussion of layout changes and position of €:
https://www.libreoffice.org/bugzilla/show_bug.cgi?id=5981
History of QUERTY
http://kanji.zinbun.kyoto-u.ac.jp/~yasuoka/publications/PreQWERTY.html
http://kanji.zinbun.kyoto-u.ac.jp/db-machine/~yasuoka/QWERTY/
http://msdn.microsoft.com/en-us/goglobal/bb964651
http://eurkey.steffen.bruentjen.eu/layout.html
http://ru.wikipedia.org/wiki/%D0%A4%D0%B0%D0%B9%D0%BB:Birman%27s_keyboard_layout.svg
http://bepo.fr/wiki/Accueil
http://www.unibuc.ro/e/prof/paliga_v_s/soft-reso/ (Academic for Mac)
http://cgit.freedesktop.org/xkeyboard-config/tree/symbols/ru
http://cgit.freedesktop.org/xkeyboard-config/tree/symbols/keypad
http://www.evertype.com/celtscript/type-keys.html (Old Irish mechanical typewriters)
http://eklhad.net/linux/app/halfqwerty.xkb (One-handed layout)
http://www.doink.ch/an-x11-keyboard-layout-for-scholars-of-old-germanic/ (and references there)
http://www.neo-layout.org/
https://commons.wikimedia.org/wiki/File:Neo2_keyboard_layout.svg
Images in (download of)
http://www.mzuther.de/en/contents/osd-neo2
Neo2 sources:
http://wiki.neo-layout.org/browser/windows/kbdneo2/Quelldateien
Shift keys at center, nice graphic:
http://www.tinkerwithabandon.com/twa/keyboarding.html
Physical keyboard:
http://www.konyin.com/?page=product.Multilingual%20Keyboard%20for%20UNITED%20STATES
Polytonic Greek
http://www.polytoniko.org/keyb.php?newlang=en
Portable keyboard layout
http://www.autohotkey.com/forum/viewtopic.php?t=28447
One-handed
http://www.autohotkey.com/forum/topic1326.html
Typing on numeric keypad
http://goron.de/~johns/one-hand/#documentation
On screen keyboard indicator
http://www.autohotkey.com/docs/scripts/KeyboardOnScreen.htm
Keyboards of ЕС-1840/1/5
http://aic-crimea.narod.ru/Study/Shen/PC/1/5-4-1.htm
(http://www.aic-crimea.narod.ru/Study/Shen/PC/main.htm) Руководство пользователя ПЭВМ
http://fdd5-25.net/fddforum/index.php?PHPSESSID=201bd45ab972f1ab4b440dcb6c7ca18f&topic=489.30
Phonetic Hebrew layout(s) (1st has many duplicates, 2nd overweighted)
http://bc.tech.coop/Hebrew-ZC.html
http://help.keymanweb.com/keyboards/keyboard_galaxiehebrewkm6.php
Greek (Galaxy) with a convenient mapping (except for Ψ) and BibleScript
http://www.tavultesoft.com/keyboarddownloads/%7B4D179548-1215-4167-8EF7-7F42B9B0C2A6%7D/manual.pdf
With 2-letter input of Unicode names:
http://www.jlg-utilities.com
Medievist's
http://www.personal.leeds.ac.uk/~ecl6tam/
Yandex visual keyboards
http://habrahabr.ru/company/yandex/blog/108255/
Implementation in FireFox
http://mxr.mozilla.org/mozilla-central/source/widget/windows/KeyboardLayout.cpp#1085
Implementation in Emacs 24.3 (ToUnicode() in fns)
http://fossies.org/linux/misc/emacs-24.3.tar.gz:a/emacs-24.3/src/w32inevt.c
http://fossies.org/linux/misc/emacs-24.3.tar.gz:a/emacs-24.3/src/w32fns.c
http://fossies.org/linux/misc/emacs-24.3.tar.gz:a/emacs-24.3/src/w32term.c
Naive implementations:
http://social.msdn.microsoft.com/forums/en-US/windowssdk/thread/07afec87-68c1-4a56-bf46-a38a9c2232e9/
Quality of a keyboard
http://www.tavultesoft.com/keymandev/quality/whitepaper1.1.pdf
Manipulating keyboards on Windows and X11
http://symbolcodes.tlt.psu.edu/keyboards/winkeyvista.html (using links there: up to Win7)
http://windows.microsoft.com/en-us/windows-8/change-keyboard-layout
http://www.howtoforge.com/changing-language-and-keyboard-layout-on-various-linux-distributions
MSKLC parser
http://pastebin.com/UXc1ub4V
By author of MSKLC Michael S. Kaplan (do not forget to follow links)
Input on Windows:
http://seit.unsw.adfa.edu.au/staff/sites/hrp/personal/Sanskrit-External/Unicode-KbdsonWindows.pdf
http://blogs.msdn.com/b/michkap/archive/2006/03/26/560595.aspx
http://blogs.msdn.com/b/michkap/archive/2006/04/22/581107.aspx
Chaining dead keys:
http://blogs.msdn.com/b/michkap/archive/2011/04/16/10154700.aspx
Mapping VK to VSC etc:
http://blogs.msdn.com/b/michkap/archive/2006/08/29/729476.aspx
[Link] Remapping CapsLock to mean Backspace in a keyboard layout
(if repeat, every second Press counts ;-)
http://colemak.com/forum/viewtopic.php?id=870
Scancodes from kbd.h get in the way
http://blogs.msdn.com/b/michkap/archive/2006/08/30/726087.aspx
What happens if you start with .klc with other VK_ mappings:
http://blogs.msdn.com/b/michkap/archive/2010/11/03/10085336.aspx
Keyboards with Ctrl-Shift states:
http://blogs.msdn.com/b/michkap/archive/2010/10/08/10073124.aspx
On assigning Ctrl-values
http://blogs.msdn.com/b/michkap/archive/2008/11/04/9037027.aspx
On hotkeys for switching layouts:
http://blogs.msdn.com/b/michkap/archive/2008/07/16/8736898.aspx
Text services
http://blogs.msdn.com/b/michkap/archive/2008/06/30/8669123.aspx
Low-level access in MSKLC
http://levicki.net/articles/tips/2006/09/29/HOWTO_Build_keyboard_layouts_for_Windows_x64.php
http://blogs.msdn.com/b/michkap/archive/2011/04/09/10151666.aspx
On font linking
http://blogs.msdn.com/b/michkap/archive/2006/01/22/515864.aspx
Unicode in console
http://blogs.msdn.com/michkap/archive/2005/12/15/504092.aspx
Adding formerly "invisible" keys to the keyboard
http://blogs.msdn.com/b/michkap/archive/2006/09/26/771554.aspx
Redefining NumKeypad keys
http://blogs.msdn.com/b/michkap/archive/2007/07/04/3690200.aspx
BUT!!!
http://blogs.msdn.com/b/michkap/archive/2010/04/05/9988581.aspx
And backspace/return/etc
http://blogs.msdn.com/b/michkap/archive/2008/10/27/9018025.aspx
kbdutool.exe, run with the /S ==> .c files
Doing one's own WM_DEADKEY processing'
http://blogs.msdn.com/b/michkap/archive/2006/09/10/748775.aspx
Dead keys do not work on SG-Caps
http://blogs.msdn.com/b/michkap/archive/2008/02/09/7564967.aspx
Dynamic keycaps keyboard
http://blogs.msdn.com/b/michkap/archive/2005/07/20/441227.aspx
Backslash/yen/won confusion
http://blogs.msdn.com/b/michkap/archive/2005/09/17/469941.aspx
Unicode output to console
http://blogs.msdn.com/b/michkap/archive/2010/10/07/10072032.aspx
Install/Load/Activate an input method/layout
http://blogs.msdn.com/b/michkap/archive/2007/12/01/6631463.aspx
http://blogs.msdn.com/b/michkap/archive/2008/05/23/8537281.aspx
Reset to a TT font from an application:
http://blogs.msdn.com/b/michkap/archive/2011/09/22/10215125.aspx
How to (not) treat C-A-Q
http://blogs.msdn.com/b/michkap/archive/2012/04/26/10297903.aspx
Treating Brazilian ABNT c1 c2 keys
http://blogs.msdn.com/b/michkap/archive/2006/10/07/799605.aspx
And JIS ¥|-key
(compare with http://www.scs.stanford.edu/11wi-cs140/pintos/specs/kbd/scancodes-7.html
http://hp.vector.co.jp/authors/VA003720/lpproj/others/kbdjpn.htm )
http://blogs.msdn.com/b/michkap/archive/2006/09/26/771554.aspx
Suggest a topic:
http://blogs.msdn.com/b/michkap/archive/2007/07/29/4120528.aspx#7119166
Installable Keyboard Layouts - Apple Developer (“.keylayout” files; modifiers not editable; cache may create problems; to enable deadkeys in X11, one may need extra work)
http://developer.apple.com/technotes/tn2002/tn2056.html
http://wordherd.com/keyboards/
http://stackoverflow.com/questions/999681/how-to-remap-context-menu-key-in-mac-os-x
http://apple.stackexchange.com/questions/21691/ukelele-generated-custom-keyboard-layouts-not-working-in-lion
http://wiki.openoffice.org/wiki/X11Keymaps
http://www.tenshu.net/2012/11/using-caps-lock-as-new-modifier-key-in.html
http://raw.github.com/lreddie/ukelele-steps/master/USExtended.keylayout
http://scripts.sil.org/cms/scripts/page.php?item_id=keylayoutmaker
ANSI/ISO/ABNT/JIS/Russian Apple’s keyboards
https://discussions.apple.com/thread/1508293
http://www.dtp-transit.jp/apple/mac/post_1137.html
http://www.dtp-transit.jp/images/apple-keyboards-US-JIS.jpg
http://m10lmac.blogspot.co.il/2007/02/fixing-brazilian-keyboard-layout.html
http://www2d.biglobe.ne.jp/~msyk/keyboard/layout/mac-jiskbd.html
http://commons.wikimedia.org/wiki/File:KB_Russian_Apple_Macintosh.svg
JIS variations (OADG109 vs A)
http://ja.wikipedia.org/wiki/JIS%E3%82%AD%E3%83%BC%E3%83%9C%E3%83%BC%E3%83%89
Different ways to access chars on Mac (1ˢᵗ suggests adding a Discover via plists via Keycaps≠Strings)
http://apple.stackexchange.com/questions/49565/how-can-i-expand-the-number-of-special-characters-i-can-type-using-my-keyboard
http://developer.apple.com/library/mac/#documentation/cocoa/conceptual/eventoverview/TextDefaultsBindings/TextDefaultsBindings.html#//apple_ref/doc/uid/20000468-CJBDEADF
http://www.hcs.harvard.edu/~jrus/Site/System%20Bindings.html Default keybindings
http://www.hcs.harvard.edu/~jrus/Site/Cocoa%20Text%20System.html
http://hints.macworld.com/article.php?story=2005051118320432 Mystery keys on Mac
http://www.snark.de/index.cgi/0007 Patching ADB drivers
http://www.snark.de/mac/usbkbpatch/index_en.html Patching USB drivers (gives LCtrl vs RCtrl etc???)
http://www.lorax.com/FreeStuff/TextExtras.html (has no docs???)
http://stevelosh.com/blog/2012/10/a-modern-space-cadet/ Combining different approaches
http://brettterpstra.com/2012/12/08/a-useful-caps-lock-key/ (simplified version of ↖)
http://david.rothlis.net/keyboards/microsoft_natural_osx/ Num Lock is claimed as not working
Compose on Mac requires hacks:
http://apple.stackexchange.com/questions/31487/add-compose-key-to-os-x
Convert Apple to MSKLC
http://typophile.com/node/90606
Keyboards on Mac:
http://homepage.mac.com/thgewecke/mlingos9.html
http://web.archive.org/web/20080717203026/http://homepage.mac.com/thgewecke/mlingos9.html
Tool to produce:
http://wordherd.com/keyboards/
http://developer.apple.com/library/mac/#technotes/tn2056/_index.html
VK_OEM_8 Kana modifier - Using instead of AltGr
http://www.kbdedit.com/manual/ex13_replacing_altgr_with_kana.html
Limitations of using KANA toggle
http://www.kbdedit.com/manual/ex12_trilang_ser_cyr_lat_gre.html
FE (Far Eastern) keyboard source code example (NEC AT is 106 with SPECIAL MULTIVK flags changed on some scancodes, OEM_7/8 producing 0x1e 0x1f, and no OEM_102):
http://read.pudn.com/downloads3/sourcecode/windows/248345/win2k/private/ntos/w32/ntuser/kbd/fe_kbds/jpn/ibm02/kbdibm02.c__.htm
http://read.pudn.com/downloads3/sourcecode/windows/248345/win2k/private/ntos/w32/ntuser/kbd/fe_kbds/jpn/kbdnecat/kbdnecat.c__.htm
http://read.pudn.com/downloads3/sourcecode/windows/248345/win2k/private/ntos/w32/ntuser/kbd/fe_kbds/jpn/106/kbd106.c__.htm
Investigation on relation between VK_ asignments, KBDEXT, KBDNUMPAD etc:
http://code.google.com/p/ergo-dvorak-for-developers/source/browse/trunk/kbddvp.c
PowerShell vs ISE (and how to find them [On Win7: WinKey Accessories]
http://blogs.msdn.com/b/powershell/archive/2009/04/17/differences-between-the-ise-and-powershell-console.aspx
http://blogs.msdn.com/b/michkap/archive/2013/01/23/10387424.aspx
http://blogs.msdn.com/b/michkap/archive/2013/02/15/10393862.aspx
http://blogs.msdn.com/b/michkap/archive/2013/02/19/10395086.aspx
http://blogs.msdn.com/b/michkap/archive/2013/02/20/10395416.aspx
Google for "Get modification number for Shift key" for code to query the kbd DLL directly ("keylogger")
http://web.archive.org/web/20120106074849/http://debtnews.net/index.php/article/debtor/2008-09-08/1088.html
http://code.google.com/p/keymagic/source/browse/KeyMagicDll/kbdext.cpp?name=0419d8d626&r=d85498403fd59bca9efc04b4e5bb4406d39439a0
How to read Unicode in an ANSI Window:
http://social.msdn.microsoft.com/Forums/en-US/windowsgeneraldevelopmentissues/thread/d455e846-d18b-4086-98de-822658bcebf0/
http://blog.tavultesoft.com/2011/06/accepting-unicode-input-in-your-windows-application.html
HTML consolidated entity names and discussion, MES charsets:
http://www.w3.org/TR/xml-entity-names
http://www.w3.org/2003/entities/2007/w3centities-f.ent
http://www.cl.cam.ac.uk/~mgk25/ucs/mes-2-rationale.html
http://web.archive.org/web/20000815100817/http://www.egt.ie/standards/iso10646/pdf/cwa13873.pdf
Ctrl2cap
http://technet.microsoft.com/en-us/sysinternals/bb897578
Low level scancode mapping
http://www.annoyances.org/exec/forum/winxp/r1017256194
http://web.archive.org/web/20030211001441/http://www.microsoft.com/hwdev/tech/input/w2kscan-map.asp
http://msdn.microsoft.com/en-us/windows/hardware/gg463447
http://www.annoyances.org/exec/forum/winxp/1034644655
???
http://netj.org/2004/07/windows_keymap
the free remapkey.exe utility that's in Microsoft NT / 2000 resource kit.
perl -wlne "BEGIN{$t = {T => q(), qw( X e0 Y e1 )}} print qq( $t->{$1}$2\t$3) if /^#define\s+([TXY])([0-9a-f]{2})\s+(?:_EQ|_NE)\((?:(?:\s*\w+\s*,){3})?\s*([^\W_]\w*)\s*(?:(?:,\s*\w+\s*){2})?\)\s*(?:\/\/.*)?$/i" kbd.h >ll2
then select stuff up to the first e1 key (but DECIMAL is not there T53 is DELETE??? take from MSKLC help/using/advanced/scancodes)
CapsLock as on typewriter:
http://web.archive.org/web/20120717083202/http://www.annoyances.org/exec/forum/winxp/1071197341
Scancodes visible on the low level:
http://openbsd.7691.n7.nabble.com/Patch-Support-F13-F24-on-PC-122-terminal-keyboard-td224992.html
http://www.seasip.info/Misc/1227T.html
Scancodes visible on Windows (with USB)
http://download.microsoft.com/download/1/6/1/161ba512-40e2-4cc9-843a-923143f3456c/translate.pdf
Problems on X11:
http://www.x.org/releases/X11R7.6/doc/kbproto/xkbproto.html (definition of XKB???)
http://wiki.linuxquestions.org/wiki/Configuring_keyboards (current???)
http://wiki.linuxquestions.org/wiki/Accented_Characters (current???)
http://wiki.linuxquestions.org/wiki/Altering_or_Creating_Keyboard_Maps (current???)
https://help.ubuntu.com/community/ComposeKey (documents almost 1/2 of the needed stuff)
http://www.gentoo.org/doc/en/utf-8.xml (2005++ ???)
http://en.gentoo-wiki.com/wiki/X.Org/Input_drivers (2009++ HAS: How to make CapsLock change layouts)
http://www.freebsd.org/cgi/man.cgi?query=setxkbmap&sektion=1&manpath=X11R7.4
http://people.uleth.ca/~daniel.odonnell/Blog/custom-keyboard-in-linuxx11
http://shtrom.ssji.net/skb/xorg-ligatures.html (of 2008???)
http://tldp.org/HOWTO/Danish-HOWTO-2.html (of 2005???)
http://www.tux.org/~balsa/linux/deadkeys/index.html (of 1999???)
http://www.x.org/releases/X11R7.6/doc/libX11/Compose/en_US.UTF-8.html
http://cgit.freedesktop.org/xorg/proto/xproto/plain/keysymdef.h
EIGHT_LEVEL FOUR_LEVEL_ALPHABETIC FOUR_LEVEL_SEMIALPHABETIC PC_SYSRQ : see
http://cafbit.com/resource/mackeyboard/mackeyboard.xkb
./xkb in /etc/X11 /usr/local/X11 /usr/share/local/X11 /usr/share/X11
(maybe it is more productive to try
ls -d /*/*/xkb /*/*/*/xkb
?)
but what dead_diaeresis means is defined here:
Apparently, may be in /usr/X11R6/lib/X11/locale/en_US.UTF-8/Compose /usr/share/X11/locale/en_US.UTF-8/Compose
http://wiki.maemo.org/Remapping_keyboard
http://www.x.org/releases/current/doc/man/man8/mkcomposecache.8.xhtml
Note: have XIM input method in GTK disables Control-Shift-u way of entering HEX unicode.
How to contribute:
http://www.freedesktop.org/wiki/Software/XKeyboardConfig/Rules
Note: the problems with handling deadkeys via .Compose are that: .Compose is handled by applications, while keymaps by server (since they may be on different machines, things can easily get out of sync); .Compose knows nothing about the current "Keyboard group" or of the state of CapsLock etc (therefore emulating "group switch" via composing is impossible).
JS code to add "insert these chars": google for editpage_specialchars_cyrilic, or
http://en.wikipedia.org/wiki/User:TEB728/monobook.jsx
Latin paleography
http://en.wikipedia.org/wiki/Latin_alphabet
http://tlt.its.psu.edu/suggestions/international/bylanguage/oenglish.html
http://guindo.pntic.mec.es/~jmag0042/LATIN_PALEOGRAPHY.pdf
http://www.evertype.com/standards/wynnyogh/ezhyogh.html
http://www.wordorigins.org/downloads/OELetters.doc
http://www.menota.uio.no/menota-entities.txt
http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2957.pdf (Uncomplete???)
http://skaldic.arts.usyd.edu.au/db.php?table=mufi_char&if=mufi (No prioritization...)
Summary tables for Cyrillic
http://ru.wikipedia.org/wiki/%D0%9A%D0%B8%D1%80%D0%B8%D0%BB%D0%BB%D0%B8%D1%86%D0%B0#.D0.A1.D0.BE.D0.B2.D1.80.D0.B5.D0.BC.D0.B5.D0.BD.D0.BD.D1.8B.D0.B5_.D0.BA.D0.B8.D1.80.D0.B8.D0.BB.D0.BB.D0.B8.D1.87.D0.B5.D1.81.D0.BA.D0.B8.D0.B5_.D0.B0.D0.BB.D1.84.D0.B0.D0.B2.D0.B8.D1.82.D1.8B_.D1.81.D0.BB.D0.B0.D0.B2.D1.8F.D0.BD.D1.81.D0.BA.D0.B8.D1.85_.D1.8F.D0.B7.D1.8B.D0.BA.D0.BE.D0.B2
http://ru.wikipedia.org/wiki/%D0%9F%D0%BE%D0%B7%D0%B8%D1%86%D0%B8%D0%B8_%D0%B1%D1%83%D0%BA%D0%B2_%D0%BA%D0%B8%D1%80%D0%B8%D0%BB%D0%BB%D0%B8%D1%86%D1%8B_%D0%B2_%D0%B0%D0%BB%D1%84%D0%B0%D0%B2%D0%B8%D1%82%D0%B0%D1%85
http://en.wikipedia.org/wiki/List_of_Cyrillic_letters - per language tables
http://en.wikipedia.org/wiki/Cyrillic_alphabets#Summary_table
http://en.wiktionary.org/wiki/Appendix:Cyrillic_script
Extra chars (see also the ordering table on page 8)
http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3194.pdf
Typesetting Old and Modern Church Slavonic
http://www.sanu.ac.rs/Cirilica/Prilozi/Skup.pdf
http://irmologion.ru/ucsenc/ucslay8.html
http://irmologion.ru/csscript/csscript.html
http://cslav.org/success.htm
http://irmologion.ru/developer/fontdev.html#allocating
Non-dialogue of Slavists and Unicode experts
http://www.sanu.ac.rs/Cirilica/Prilozi/Standard.pdf
http://kodeks.uni-bamberg.de/slavling/downloads/2008-07-26_white-paper.pdf
Newer: (+ combining ф)
http://tug.org/pipermail/xetex/2012-May/023007.html
http://www.unicode.org/alloc/Pipeline.html As below, plus N-left-hook, ДЗЖ ДЧ, L-descender, modifier-Ь/Ъ
http://www.synaxis.info/azbuka/ponomar/charset/charset_1.htm
http://www.synaxis.info/azbuka/ponomar/charset/charset_2.htm
http://www.synaxis.info/azbuka/ponomar/roadmap/roadmap.html
http://www.ponomar.net/cu_support.html
http://www.ponomar.net/files/out.pdf
http://www.ponomar.net/files/variants.pdf (5 VS for Mark's chapter, 2 VS for t, 1 VS for the rest)
http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3772.pdf typikon (+[semi]circled), ε-form
http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3971.pdf inverted ε-typikon
http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3974.pdf two variants of o/O
http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3998.pdf Mark's chapter
http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3563.pdf Reversed tse
IPA
http://upload.wikimedia.org/wikipedia/commons/f/f5/IPA_chart_2005_png.svg
http://en.wikipedia.org/wiki/Obsolete_and_nonstandard_symbols_in_the_International_Phonetic_Alphabet
http://en.wikipedia.org/wiki/Case_variants_of_IPA_letters
Table with Unicode points marked:
http://www.staff.uni-marburg.de/~luedersb/IPA_CHART2005-UNICODE.pdf
(except for "Lateral flap" and "Epiglottal" column/row.
(Extended) IPA explained by consortium:
http://unicode.org/charts/PDF/U0250.pdf
IPA keyboard
http://www.rejc2.co.uk/ipakeyboard/
http://en.wikipedia.org/wiki/International_Phonetic_Alphabet_chart_for_English_dialects#cite_ref-r_11-0
Is this discussing KBDNLS_TYPE_TOGGLE on VK_KANA???
http://mychro.mydns.jp/~mychro/mt/2010/05/vk-f.html
Windows: fonts substitution/fallback/replacement
http://msdn.microsoft.com/en-us/goglobal/bb688134
Problems on Windows:
http://en.wikipedia.org/wiki/Help:Special_characters#Alt_keycodes_for_Windows_computers
http://en.wikipedia.org/wiki/Template_talk:Unicode#Plane_One_fonts
Console font: Lucida Console 14 is viewable, but has practically no Unicode support.
Consolas (good at 16) has much better Unicode support (sometimes better sometimes worse than DejaVue)
Dejavue is good at 14 (equal to a GUI font size 9 on 15in 1300px screen; 16px unifont is native at 12 here)
http://cristianadam.blogspot.com/2009/11/windows-console-and-true-type-fonts.html
Apparently, Windows picks up the flavor (Bold/Italic/Etc) of DejaVue at random; see
http://jpsoft.com/forums/threads/strange-results-with-cp-1252.1129/
- he got it in bold. I''m getting it in italic... Workaround: uninstall
all flavors but one (the BOOK flavor), THEN enable it for the console... Then reinstall
(preferably newer versions).
Display (how WikiPedia does it):
http://en.wikipedia.org/wiki/Help:Special_characters#Displaying_special_characters
http://en.wikipedia.org/wiki/Template:Unicode
http://en.wikipedia.org/wiki/Template:Unichar
http://en.wikipedia.org/wiki/User:Ruud_Koot/Unicode_typefaces
In CSS: .IPA, .Unicode { font-family: "Arial Unicode MS", "Lucida Sans Unicode"; }
http://web.archive.org/web/20060913000000/http://en.wikipedia.org/wiki/Template:Unicode_fonts
Inspect which font is used by Firefox:
https://addons.mozilla.org/en-US/firefox/addon/fontinfo/
Windows shortcuts:
http://windows.microsoft.com/en-US/windows7/Keyboard-shortcuts
http://www.redgage.com/blogs/pankajugale/all-keyboard-shortcuts--very-useful.html
https://skydrive.live.com/?cid=2ee8d462a8f365a0&id=2EE8D462A8F365A0%21141
http://windows.microsoft.com/en-us/windows-8/new-keyboard-shortcuts
On meaning of Unicode math codepoints
http://milde.users.sourceforge.net/LUCR/Math/unimathsymbols.pdf
http://milde.users.sourceforge.net/LUCR/Math/data/unimathsymbols.txt
http://unicode.org/Public/math/revision-09/MathClass-9.txt
http://www.w3.org/TR/MathML/
http://www.w3.org/TR/xml-entity-names/
http://www.w3.org/TR/xml-entity-names/bycodes.html
Monospaced fonts with combining marks (!)
https://bugs.freedesktop.org/show_bug.cgi?id=18614
https://bugs.freedesktop.org/show_bug.cgi?id=26941
Indic ISCII - any hope with it? (This is not representable...:)
http://unicode.org/mail-arch/unicode-ml/y2012-m09/0053.html
(Percieved) problems of Unicode (2001)
http://www.ibm.com/developerworks/library/u-secret.html
On a need to have input methods for unicode
http://unicode.org/mail-arch/unicode-ml/y2012-m07/0226.html
On info on Unicode chars
http://unicode.org/mail-arch/unicode-ml/y2012-m07/0415.html
Zapf dingbats encoding, and other fine points of AdobeGL:
ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/ADOBE/zdingbat.txt
http://web.archive.org/web/20001015040951/http://partners.adobe.com/asn/developer/typeforum/unicodegn.html
Yet another (IMO, silly) way to handle '; fight: ' vs ` ´
http://www.cl.cam.ac.uk/~mgk25/ucs/apostrophe.html
Surrogate characters on IE
HKEY_CURRENT_USER\Software\Microsoft\Internet Explorer\International\Scripts\42
http://winvnkey.sourceforge.net/webhelp/surrogate_fonts.htm
http://msdn.microsoft.com/en-us/library/aa918682.aspx Script IDs
Quoting tchrist: You can snag unichars
, uniprops
, and uninames
from http://training.perl.com if you like.
Tom's unicode scripts
http://search.cpan.org/~bdfoy/Unicode-Tussle-1.03/lib/Unicode/Tussle.pm
.XCompose: on docs and examples
Syntax of .XCompose
is (partially) documented in
http://www.x.org/archive/current/doc/man/man5/Compose.5.xhtml
http://cgit.freedesktop.org/xorg/lib/libX11/tree/man/Compose.man
# Modifiers are not documented
# (Shift, Alt, Lock, Ctrl with aliases Meta, Caps; apparently,
# ! is applied to a sequence without ~ ???)
Semantic (e.g., which of keybindings has a preference) is not documented. Experiments (see below) show that a longer binding wins; if same length, one which is loaded later wins. Relation with presence of modifiers is not clear.
# (the source of imLcPrs.c shows that the explansion of the
# shorter sequence is stored too - but the presence of
# ->succession means that the code to process the resulting
# tree ignores the expansion).
Before the syntax was documented: For the best approximation, read the parser's code, e.g., google for
inurl:compose.c XCompose
site:cgit.freedesktop.org "XCompose"
site:cgit.freedesktop.org "XCompose" filetype:c
_XimParseStringFile
http://cgit.freedesktop.org/xorg/lib/libX11/tree/modules/im/ximcp/imLcIm.c
http://cgit.freedesktop.org/xorg/lib/libX11/tree/modules/im/ximcp/imLcPrs.c
http://uim.googlecode.com/svn-history/r6111/trunk/gtk/compose.c
http://uim.googlecode.com/svn/tags/uim-1.5.2/gtk/compose.c
The actual use of the compiled compose table:
http://cgit.freedesktop.org/xorg/lib/libX11/tree/modules/im/ximcp/imLcFlt.c
Apparently, the first node (= defined last) in the tree which matches keysym and modifiers is chosen. So to override <Foo> <Bar>
, looks like (checked to work!) ~Ctrl <Foo>
may be used... On the other hand, defining both <Foo> <Bar> <Baz>
and (later) <Foo> ~Ctrl <Bar>
, one would expect that <Foo> <Ctrl-Bar> <Baz>
should still trigger the expansion of <Foo> <Bar> <Baz>
— but it does not... See also:
http://cgit.freedesktop.org/xorg/lib/libX11/tree/modules/im/ximcp/imLcLkup.c
The file .XCompose is processed by X11 clients on startup. The changes to this file should be seen immediately by all newly started clients (but GTK or QT applications may need extra config - see below) unless the directory ~/.compose-cache is present and has a cache file compatible with binary architecture (then until cache expires - one day after creation - changes are not seen). The name .XCompose may be overriden by environment variable XCOMPOSEFILE
.
To get (better?) examples, google for "multi_key" partial alpha "DOUBLE-STRUCK"
.
# include these first, so they may be overriden later
include "%H/my-Compose/.XCompose-kragen"
include "%H/my-Compose/.XCompose-ootync"
include "%H/my-Compose/.XCompose-pSub"
Check success: kragen: \ space
--> ␣; ootync: o F
--> ℉; pSub: 0 0
--> ∞ ...
Older versions of X11 do not understand %L %S. - but understand %H
E.g. Debian Squeeze 6.0.6; according to
http://packages.debian.org/search?keywords=x11-common
it has v 1:7.5+8+squeeze1
).
include "/etc/X11/locale/en_US.UTF-8/Compose"
include "/usr/share/X11/locale/en_US.UTF-8/Compose"
Import default rules from the system Compose file: usually as above (but supported only on newer systems):
include "%L"
detect the success of the lines above: get #
by doing Compose + +
...
The next file to include have been generated by
perl -wlne 'next if /#\s+CIRCLED/; print if />\s+<.*>\s+<.*>\s+<.*/' /usr/share/X11/locale/en_US.UTF-8/Compose
### Std tables contain quadruple prefix for GREEK VOWELS and CIRCLED stuff
### only. But there is a lot of triple prefix...
perl -wne 'next if /#\s+CIRCLED/; $s{$1}++ or print qq( $1) if />\s+<.*>\s+<.*>\s+<.*"(.*)"/' /usr/share/X11/locale/en_US.UTF-8/Compose
## – — ☭ ª º Ǖ ǖ Ǘ ǘ Ǚ ǚ Ǜ ǜ Ǟ ǟ Ǡ ǡ Ǭ ǭ Ǻ ǻ Ǿ ǿ Ȫ ȫ Ȭ ȭ Ȱ ȱ ʰ ʱ ʲ ʳ ʴ ʵ ʶ ʷ ʸ ˠ ˡ ˢ ˣ ˤ ΐ ΰ Ḉ ḉ Ḕ ḕ Ḗ ḗ Ḝ ḝ Ḯ ḯ Ḹ ḹ Ṍ ṍ Ṏ ṏ Ṑ ṑ Ṓ ṓ Ṝ ṝ Ṥ ṥ Ṧ ṧ Ṩ ṩ Ṹ ṹ Ṻ ṻ Ấ ấ Ầ ầ Ẩ ẩ Ẫ ẫ Ậ ậ Ắ ắ Ằ ằ Ẳ ẳ Ẵ ẵ Ặ ặ Ế ế Ề ề Ể ể Ễ ễ Ệ ệ Ố ố Ồ ồ Ổ ổ Ỗ ỗ Ộ ộ Ớ ớ Ờ ờ Ở ở Ỡ ỡ Ợ ợ Ứ ứ Ừ ừ Ử ử Ữ ữ Ự ự ἂ ἃ ἄ ἅ ἆ ἇ Ἂ Ἃ Ἄ Ἅ Ἆ Ἇ ἒ ἓ ἔ ἕ Ἒ Ἓ Ἔ Ἕ ἢ ἣ ἤ ἥ ἦ ἧ Ἢ Ἣ Ἤ Ἥ Ἦ Ἧ ἲ ἳ ἴ ἵ ἶ ἷ Ἲ Ἳ Ἴ Ἵ Ἶ Ἷ ὂ ὃ ὄ ὅ Ὂ Ὃ Ὄ Ὅ ὒ ὓ ὔ ὕ ὖ ὗ Ὓ Ὕ Ὗ ὢ ὣ ὤ ὥ ὦ ὧ Ὢ Ὣ Ὤ Ὥ Ὦ Ὧ ᾀ ᾁ ᾂ ᾃ ᾄ ᾅ ᾆ ᾇ ᾈ ᾉ ᾊ ᾋ ᾌ ᾍ ᾎ ᾏ ᾐ ᾑ ᾒ ᾓ ᾔ ᾕ ᾖ ᾗ ᾘ ᾙ ᾚ ᾛ ᾜ ᾝ ᾞ ᾟ ᾠ ᾡ ᾢ ᾣ ᾤ ᾥ ᾦ ᾧ ᾨ ᾩ ᾪ ᾫ ᾬ ᾭ ᾮ ᾯ ᾲ ᾴ ᾷ ῂ ῄ ῇ ῒ ῗ ῢ ῧ ῲ ῴ ῷ ⁱ ⁿ ℠ ™ שּׁ שּׂ а̏ А̏ е̏ Е̏ и̏ И̏ о̏ О̏ у̏ У̏ р̏ Р̏ 🙌
The folloing exerpt from NEO compose tables may be good if you use keyboards which do not generate dead keys, but may generate Cyrillic keys; in other situations, edit filtering/naming on the following download command and on the include
line below. (For my taste, most bindings are useless since they contain keysymbols which may be generated with NEO, but not with less intimidating keylayouts.)
(Filtering may be important, since having a large file may significantly slow down client's startup (without ~/.compose-cache???).)
# perl -wle 'foreach (qw(base cyrillic greek lang math)) {my @i=@ARGV; $i[-1] .= qq($_.module?format=txt); system @i}' wget -O - http://wiki.neo-layout.org/browser/Compose/src/ | perl -wlne 'print unless /<(U[\dA-F]{4,6}>|dead_|Greek_)/' > .XCompose-neo-no-Udigits-no-dead-no-Greek
include "%H/.XCompose-neo-no-Udigits-no-dead-no-Greek"
# detect the success of the line above: get ♫ by doing Compose Compose (but this binding is overwritten later!)
###################################### Neo's Math contains junk at line 312
Print with something like (loading in a web browser after this):
perl -l examples/filter-XCompose ~/.XCompose-neo-no-Udigits-no-dead-no-Greek > ! o-neo
env LC_ALL=C sort -f o-neo | column -x -c 130 > ! /tmp/oo-neo-x
“Systematic” parts of rules in a few .XCompose
================== .XCompose b=bepo o=ootync k=kragen p=pSub s=std
b Double-Struck b
o circled ops b
O big circled ops b
r rotated b 8ACETUv ∞
- sub p
= double arrows po
g greek po
m math p |=Double-Struck rest haphasard...
O circles p Oo
S stars p Ss
^ sup p added: i -
| daggers p
Double mathop ok +*&|%8CNPQRZ AE
# thick-black arrows o
-,Num- arrows o
N/N fractions o
hH pointing hands o
O circled ops o
o degree o
rR roman nums o
\ UP upper modifiers o
\ DN lower modifiers o
{ set theoretic o
| arrows |-->flavors o
UP / roots o
LFT DN 6-quotes, bold delim o
RT DN 9-quotes, bold delim o
UP,DN super,sub o
DOUBLE-separated-by-& op k ( )
in-() circled k xx for tensor
in-[] boxed, dice, play-cards k
BKSP after revert k
< after revert k
` after small-caps k
' after hook k
, after hook below k
h after phonetic k
# musical k
%0 ROMAN k %_0 for two-digit
% roman k %_ for two-digit
* stars k
*. var-greek k
* greek k
++, 3 triple k
+ double k
, quotes k
!, / negate k
6,9 6,9-quotes k
N N fractions k
= double-arrows, RET k
CMP x2 long names k
f hand, pencils k
\ combining??? k
^ super, up modifier k
_ low modifiers k
|B, |W chess, checkers, B&W k
| double-struck k
ARROWS ARROWS k
! dot below s
" diaeresis s
' acute s
trail < left delimiter s
trail > right delimiter s
trail \ slopped variant s
( ... ) circled s
( greek aspirations s
) greek aspirations s
+ horn s
, cedilla s
. dot above s
- hor. bar s
/ diag, vert hor. bar s
; ogonek s
= double hor.bar s
trail = double hor.bar s
? hook above s
b breve s
c check above s
iota iota below s
trail 0338 negated s
o ring above s
U breve s
SOME HEBREW
^ circumblex s
^ _ superscript s
^ undbr superscript s
_ bar s
_ subscript s
underbr subscript s
` grave s
~ greek dieresis s
~ tilde s
overbar bar s
´ acute s ´ is not '
¸ cedilla s ¸ is cedilla
LIMITATIONS
Currently only output for Windows keyboard layout drivers (via MSKLC) is available.
Currently only the keyboards with US-mapping of hardware keys to "the etched symbols" are supported (think of German physical keyboards where Y/Z keycaps are swapped: Z is etched between T and U, and Y is to the left of X, or French which swaps A and Q, or French or Russian physical keyboards which have more alphabetical keys than 26).
While the architecture of assembling a keyboard of small easy-to-describe pieces is (IMO) elegant and very powerful, and is proven to be useful, it still looks like a collection of independent hacks. Many of these hacks look quite similar; it would be great to find a way to unify them, so reduce the repertoir of operations for assembly.
The current documentation of the module’s functionality is not complete.
The implementation of the module is crumbling under its weight. Its evolution was by bloating (even when some design features were simplified). Since initially I had very little clue to which level of abstraction and flexibility the keyboard description would evolve, bloating accumulated to incredible amounts.
COPYRIGHT
Copyright (c) 2011-2013 Ilya Zakharevich <ilyaz@cpan.org>
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.0 or, at your option, any later version of Perl 5 you may have available.
The distributed examples may have their own copyrights.
TODO
UniPolyK-MultiSymple
Multiple linked faces (accessible as described in ChangeLog); designated Primary- and Secondary- switch keys (as Shift-Space and AltGr-Space now).
Soft hyphen
as a deadkey may be not a good idea: following it by a special key (such as Shift-Tab
, or Control-Enter
) may insert the deadkey character??? Hence the character should be highly visible... (Now the key is invisible, so this is irrelevant...)
Currently linked layers must have exactly the same number of keys in VK-tables.
VK tables for TAB, BACK were BS. Same (remains) for the rest of unusual keys... (See TAB-was.) But UTOOL cannot handle them anyway...
Define an extra element in VK keys: linkable. Should be sorted first in the kbd map, and there should be the same number in linked lists. Non-linkable keys should not be linked together by deadkey access...
Interaction of FromToFlipShift with SelectRX not intuitive. This works: Diacritic[<sub>](SelectRX[[0-9]](FlipShift(Latin)))
DefinedTo cannot be put on Cyrillic 3a9 (yo to superscript disappears - due to duplication???).
... so we do it differently now, but: LinkLayer was not aggressively resolving all the occurences of a character on a layer before we started to combine it with Diacritic_if_undef... - and Cyrillic 3a9 is not helped...
via_parent() is broken - cannot replace for Diacritic_if_undef.
Currently, we map ephigraphic letters to capital letters - is it intuitive???
dotted circle ◌ 25CC
DeadKey_Map200A= FlipLayers #DeadKey_Map200A_0= Id(Russian-AltGr) #DeadKey_Map200A_1= Id(Russian) performs differently from the commented variant: it adds links to auto-filled keys...
Why ¨ on THIN SPACE inserts OGONEK after making ¨ multifaceted???
When splitting a name on OVER/BELOW/ABOVE, we need both sides as modifiers???
Ỳ currently unreachable (appears only in Latin-8 Celtic, is not on Wikipedia)
Somebody is putting an extra element at the end of arrays for layers??? - Probably SPACE...
Need to treat upside-down as a pseudo-decomposition.
We decompose reversed-smallcaps in one step - probably better add yet another two-steps variant...
When creating a <pseudo-stuff> treat SYMBOL/SIGN/FINAL FORM/ISOLATED FORM/INITIAL FORM/MEDIAL FORM; note that SIGN may be stripped: LESS-THAN SIGN becomes LESS-THAN WITH DOT
We do not do canonical-merging of diacritics; so one needs to specify VARIA in addition to GRAVE ACCENT.
We use a smartish algorithm to assign multiple diacritics to the same deadkey. A REALLY smart algorithm would use information about when a particular precombined form was introduced in Unicode...
Inspector tool for NamesList.txt:
grep " WITH .* " ! | grep -E -v "(ACUTE|GRAVE|ABOVE|BELOW|TILDE|DIAERESIS|DOT|HOOK|LEG|MACRON|BREVE|CARON|STROKE|TAIL|TONOS|BAR|DOTS|ACCENT|HALF RING|VARIA|OXIA|PERISPOMENI|YPOGEGRAMMENI|PROSGEGRAMMENI|OVERLAY|(TIP|BARB|CORNER) ([A-Z]+WARDS|UP|DOWN|RIGHT|LEFT))$" | grep -E -v "((ISOLATED|MEDIAL|FINAL|INITIAL) FORM|SIGN|SYMBOL)$" |less
grep " WITH " ! | grep -E -v "(ACUTE|GRAVE|ABOVE|BELOW|TILDE|DIAERESIS|CIRCUMFLEX|CEDILLA|OGONEK|DOT|HOOK|LEG|MACRON|BREVE|CARON|STROKE|TAIL|TONOS|BAR|CURL|BELT|HORN|DOTS|LOOP|ACCENT|RING|TICK|HALF RING|COMMA|FLOURISH|TITLO|UPTURN|DESCENDER|VRACHY|QUILL|BASE|ARC|CHECK|STRIKETHROUGH|NOTCH|CIRCLE|VARIA|OXIA|PSILI|DASIA|DIALYTIKA|PERISPOMENI|YPOGEGRAMMENI|PROSGEGRAMMENI|OVERLAY|(TIP|BARB|CORNER) ([A-Z]+WARDS|UP|DOWN|RIGHT|LEFT))$" | grep -E -v "((ISOLATED|MEDIAL|FINAL|INITIAL) FORM|SIGN|SYMBOL)$" |less
AltGrMap should be made CapsLock aware (impossible: smart capslock works only on the first layer, so the dead char must be on the first layer). [May work for Shift-Space - but it has a bag of problems...]
Alas, CapsLock'ing a composition cannot be made stepwise. Hence one must calculate it directly. (Oups, Windows CapsLock is not configurable on AltGr-layer. One may need to convert it to VK_KANA???)
WarnConflicts[exceptions] and NoConflicts translation map parsing rules.
Need a way to map to a different face, not a different layer.
Vietnamese: to put second accent over ă, ơ (o/horn), put them over ae/oe; - including another ˘ which would "cancel the implied one", so will get o-horn itself. - Except for acute accent which should replaced by ¨, and hook must be replaced by ˆ. (Over ae/oe there is only macron and diaeresis over ae.)
Or: for the purpose of taking a second accent, AltGr-A behaves as Ă (or Â?), AltGr-O behaves as Ô (or O-horn Ơ?). Then Å and O/ behave as the other one... And ˚ puts the dot *below*, macron puts a hook. Exception: ¨ acts as ´ on the unaltered AE.
While Å takes acute accent, one can always input it via putting ˚ on Á.
If Ê is on the keyboard (and macron puts a hook), then the only problem is how to enter a hook alone (double circumflex is not precombined), dot below (???), and accents on u-horn ư.
Mogrification rules for double accents: AE Å OE O/ Ù mogrify into hatted/horned versions; macron mogrifies into a hook; second hat modifies a hat into a horn. The only problem: one won't be able to enter double grave on U - use the OTHER combination of ¨ and `... And how to enter dot below on non-accented aue? Put ¨ on umlaut? What about Ë?
To allow . or , on VK_DECIMAL: maybe make CapsLock-dependent?
http://blogs.msdn.com/b/michkap/archive/2006/09/13/752377.aspx
How to write this diacritic recipe: insert hacheck on AltGr-variant, but only if the breve on the base layer variant does not insert hacheck (so inserts breve)???
Sorting diacritics by usefulness: we want to apply one of accents from the given list to a given key (with l layers of 2 shift states). For each accent, we have 2l possible variants for composition; assign to 2 variants differing by Shift the minimum penalty of the two. For each layer we get several possible combinations of different priority; and for each layer, we have a certain number of slots open. We can redistribute combinations from the primary layer to secondary one, but not between secondary layers.
Work with slots one-by-one (so that the assignent is "monotinic" when the number of slots increases). Let m be the number of layers where slots are present. Take highest priority combinations; if the number of "extra" combinations in the primary layer is at least m, distribute the first m of them to secondary layers. If n<m of them are present, fill k layers which have no their own combinations first, then other n-k layers. More precisely, if n<=k, use the first n of "free" layers; if n>k, fill all free layers, then the last n-k of non-free layers.
Repeat as needed (on each step, at most one slot in each layer appears).
But we do not need to separate case-differing keys! How to fix?
All done, but this works only on the current face! To fix, need to pass to the translator all the face-characters present on the given key simultaneously.
===== Accent-key TAB accesses extra bindinges (including NUM->numbered one)
(may be problematic with some applications???
-- so duplicate it on + and @ if they is not occupied
-- there is nothing related to AT in Unicode)
Diacritics_0218_0b56_0c34= May create such a thing... (0b56_0c34 invisible to the user).
Hmm - how to combine penaltized keys with reversion? It looks like
the higher priority bindings would occupy the hottest slots in both
direct and reverse bindings...
Maybe additional forms Diacrtitics2S_* and Diacrtitics2E_* which fight
for symbols of the same penalty from start and from end (with S winning
on stuff exactly in the middle...). (The E-form would also strip the last |-group.)
' Shift-Space (from US face) should access the second level of Russian face. To avoid infinite cycles, face-switch keys to non-private faces should be marked in each face...
"Acute makes sharper" is applicable to () too to get <>-parens...
Another ways of combining: "OR EQUAL TO", "OR EQUIVALENT TO", "APL FUNCTIONAL SYMBOL QUAD", "APL FUNCTIONAL SYMBOL *** UNDERBAR", "APL FUNCTIONAL SYMBOL *** DIAERESIS".
When recognizing symbols for GREEK, treat LUNATE (as NOP). Try adding HEBREW LETTER at start as well...
Compare with: 8 basic accents: http://en.wikipedia.org/wiki/African_reference_alphabet (English 78)
When a diacritic on a base letter expands to several variants, use them all (with penalty according to the flags).
Problem: acute on acute makes double acute modifier...
Penalized letter are temporarily completely ignored; need to attach them in the end... - but not 02dd which should be completely ignore...
Report characters available on diacritic chains, but not accessible via such chains. Likewise for characters not accessible at all. Mark certain chains as "Hacks" so that they are not counted in these lists.
Long s and "preceded by" are not handled since the table has its own (useless) compatibility decompositions.
╒╤╕
╞╪╡
╘╧╛
╓╥╖
╟╫╢
╙╨╜
╔╦╗
╠╬╣
╚╩╝
┌┬┐
├┼┤
└┴┘
┎┰┒
┠╂┨
┖┸┚
┍┯┑
┝┿┥
┕┷┙
┏┳┓
┣╋┫
┗┻┛
On top of a light-lines grid (3×2, 2×3, 2×2; H, V, V+H):
┲┱
╊╉
┺┹
┢╈┪
┡╇┩
╆╅
╄╇
╼†━†╾†╺†╸†╶†─†╴†╌†┄†┈† †╍†┅†┉†
╼━╾╺╸╶─╴╌┄┈ ╍┅┉
╻
┃
╹
╷
│
╵
╽
╿
╎┆┊╏┇┋
╲ ╱
╳
╭╮
╰╯
◤▲◥
◀■▶
◣▼◢
◜△◝
◁□▷
◟▽◞
◕◓◔
◐○◑
◒
▗▄▖
▐█▌
▝▀▘
▛▀▜
▌ ▐
▙▄▟
░▒▓
Implementation details
Since the FullFace[FNAME]
accessor may have different effects at different moment of a face FNAME
synthesis, here is the order in which FullFace[FNAME]
changes:
ini_layers: essentially, contains what is given in the key “layers” of the face recipe
Later, a version of these layers with exportable keys marked is created as ini_layers_prefix.
ini_filled_layers: adds extra (fake) keys containing control characters and created via-VK-keys
(For these extended layers, the previous version can be inspected via ini_copy1.)
(created when exportable keys are handled.)
The next modification is done not by modifying the list of names of layers associated to the face, but by editing the corresponding layers in place. (The unmodified version of layer, one containing the exportable keys, is accessible via ini_copy
.) On this step one adds the missing characters via from the face specified in the LinkFace
key.
17 POD Errors
The following errors were encountered while parsing the POD:
- Around line 1079:
Expected '=item 2'
- Around line 1090:
Expected '=item 3'
- Around line 1097:
Expected '=item 4'
- Around line 1114:
Expected '=item 5'
- Around line 1129:
Expected '=item 6'
- Around line 1142:
Expected '=item 7'
- Around line 1147:
Expected '=item 8'
- Around line 1153:
Expected '=item 9'
- Around line 1165:
Expected '=item 10'
- Around line 1183:
Expected '=item 11'
- Around line 1195:
Expected '=item 12'
- Around line 1690:
Unterminated C<...> sequence
- Around line 3674:
=back without =over
- Around line 3697:
Expected '=item 2'
- Around line 3702:
Expected '=item 3'
- Around line 3707:
Expected '=item 4'
- Around line 3711:
Expected '=item 5'