NAME
UI::KeyboardLayout - Module for designing keyboard layouts
SYNOPSIS
#!/usr/bin/perl -wC31
use UI::KeyboardLayout;
use strict;
# Download from http://www.unicode.org/Public/UNIDATA/
UI::KeyboardLayout::->set_NamesList("$ENV{HOME}/Downloads/NamesList.txt");
my $i = do {local $/; open $in, '<', 'MultiUni.kbdd' or die; <$in>};
# Init from in-memory copy of the configfile
my $k = UI::KeyboardLayout:: -> new_from_configfile($i)
-> fill_win_template( 1, [qw(faces CyrillicPhonetic)] );
print $k;
open my $f, '<', "$ENV{HOME}/Downloads/NamesList.txt" or die;
my $k = UI::KeyboardLayout::->new();
my ($d,$c,$names,$blocks,$extraComb,$uniVersion) = $k->parse_NameList($f);
close $f or die;
$k->print_decompositions($d);
$k->print_compositions ($c);
UI::KeyboardLayout::->set_NamesList("$ENV{HOME}/Downloads/NamesList.txt",
"$ENV{HOME}/Downloads/DerivedAge.txt");
my $l = UI::KeyboardLayout::->new();
$l->print_compositions;
$l->print_decompositions;
UI::KeyboardLayout::->set_NamesList("$ENV{HOME}/Downloads/NamesList-6.1.0d8.txt",
"$ENV{HOME}/Downloads/DerivedAge-6.1.0d13.txt"));
my $l = UI::KeyboardLayout::->new_from_configfile('examples/EurKey++.kbdd');
for my $F (qw(US CyrillicPhonetic)) {
# Open file, select()
print $l->fill_win_template(1,[qw(faces US)]);
$l->print_coverage(q(US));
}
perl -wC31 UI-KeyboardLayout\examples\grep_nameslist.pl "\b(ALPHA|BETA|GAMMA|DELTA|EPSILON|ZETA|ETA|THETA|IOTA|KAPPA|LAMDA|MU|NU|XI|OMICRON|PI|RHO|SIGMA|TAU|UPSILON|PHI|CHI|PSI|OMEGA)\b" ~/Downloads/NamesList.txt >out-greek
AUTHORS
Ilya Zakharevich, ilyaz@cpan.org
DESCRIPTION
In this section, a "keyboard" has a certain "character repertoir" (which characters may be entered using this keyboard), and a mapping associating a character in the repertoir to a keypress or to several (sequential or simultaneous) keypresses. A small enough keyboard may have a pretty arbitrary mapping and remain useful (witness QUERTY vs Dvorak vs Colemac). However, if a keyboard has a sufficiently large repertoir, there must be a strong logic ("orthogonality") in this association - otherwise the most part of the repertoir will not be useful (except for people who have an extraordinary memory - and are ready to invest part of it into the keyboard).
"Character repertoir" needs of different people vary enormously; observing the people around me, I get a very narrow point of view. But it is the best I can do; what I observe is that many of them would use 1000-2000 characters if they had a simple way to enter them; and the needs of different people do not match a lot. So to be helpful to different people, a keyboard should have at least 2000-3000 different characters in the repertoir. (Some ballpark comparisons: MES-3B has about 2800 characters; Adobe Glyph list corresponds to about 3600 Unicode characters.)
To access these characters, how much structure one needs to carry in memory? One can make a (trivial) estimate from below: on Windows, the standard US keyboard allows entering 100 - or 104 - characters (94 ASCII keys, SPACE, ENTER, TAB - moreover, C-ENTER, BACKSPACE and C-BACKSPACE also produce characters; so do C-[, C-] and C-\ C-Break in most layouts!). If one needs about 30 times more, one could do with 5 different ways to "mogrify" a character; if these mogrifications are "orthogonal", then there are 2^5 = 32 ways of combining them, and one could access 32*104 = 3328 characters.
Of course, the characters in a "reasonable repertoir" form a very amorphous mass; there is no way to introduce a structure like that which is "natural" (so there is a hope for "ordinary people" to keep it in memory). So the complexity of these mogrification is not in their number, but in their "nature". One may try to decrease this complexity by having very easy to understand mogrifications - but then there is no hope in having 5 of them - or 10, or 15, or 20.
However, we know that many people are able to memorise the layout of 70 symbols on a keyboard. So would they be able to handle, for example, 30 different "natural" mogrifications? And how large a repertoir of characters one would be able to access using these mogrifications?
This module does not answer these questions directly, but it provides tools for investigating them, and tools to construct the actually working keyboard layouts based on these ideas. It consists of the following principal components:
- Unicode table examiner
-
distills relations between different Unicode characters from the Unicode tables, and combines the results with user-specified "manual mogrification" rules. From these automatic/manual mogrifications, it constructs orthogonal scaffolding supporting Unicode characters (we call it composition/decomposition, but it is a major generalization of the corresponding Unicode consortium's terms).
- Layout constructor
-
allows building keyboard layouts based on the above mogrification rules, and on other visual and/or logical directives. It combines the bulk-handling ability of automatic rule-based approach with a flexibility provided by a system of manual overrides. (The rules are read from a .kbdd Keyboard Description file.
- System-specific software layouts
-
may be created basing on the "theoretical layout" made by the layout constructor (currently only on Windows, and only via KBDUTOOL route).
- Report/Debugging framework
-
creates human-readable descriptions of the layout, and/or debugging reports on how the layout creation logic proceeded.
The last (and, probably, the most important) component of the distribution is an example keyboard layout created using this toolset.
Keyboard description files
Syntax
I could not find an appropriate existing configuration file format, so was farced to invent yet-another-config-file-format. Sorry...
Config file is for initialization of a tree implementing a hash of hashes of hashes etc whole leaves are either strings or arrays of strings, and keys are words. The file consists of "sections"; each section fills a certain hash in the tree.
Sections are separated by "section names" which are sequences of word character and /
(possibly empty) enclosed in square brackets. []
is a root hash, then [word]
is a hash reference by key word
in the root hash, then [word/another]
is a hash referenced by element of the hash referenced by [word]
etc. Additionally, a section separator may look like [visual -> wordsAndSlashes]
.
Sections are of two type: normal and visual. A normal section consists of comments (starting with #
) and assignments. An assignment is in one of 4 forms:
word=value
+word=value
@word=value,value,value,value
/word=value/value/value/value
The first assigns a string value
to the key word
in the hash of the current section. The second adds a value to an array referenced by the key word
; the other two add several values. Trailing whitespace is stripped.
Any string value without end-of-line characters and trailing whitespace can be added this way (and values without commas or without slash can be added in bulk to arrays). In particular, there may be no whitespace before =
sign, and the whitespace after =
is a part of the value.
Visual sections consist of comments, assignments, and content
, which is the rest of the section. Comments after the last assignment become parts of the content. The content is preserved as a whole, and assigned to the key unparsed_data
; trailing whitespace is stripped. (This is the way to insert a value containing end-of-line-characters.)
In the context of this distribution, the intent of visual sections is to be parsed by a postprocessor. So the only purpose of explicit assignments in a visual section is to configure how the rest is parsed; after the parsing is done (and the result is copied elsewhere in the tree) these values should better be not used.
Semantic of visual sections
Two types of visual sections are supported: DEADKEYS
and KBD
. A content of DEADKEYS
section is just an embedded (part of) .klc file. We can read deadkey mappings and deadkey names from such sections. The name of the section becomes the name of the mapping functions which may be used inside the Diacritic_*
rule (or in a recipe for a computed layer).
A content of KBD
section consists of #
-comment lines and "the mapping lines"; every "mapping line" encodes one row in a keyboard (in one or several layouts). (But the make up of rows of this keyboard may be purely imaginary; it is normal to have a "keyboard" with one row of numbers 0...9.) Configuration settings specify how many lines are per row, and how many layers are encoded by every line, and what are the names of these layers:
visual_rowcount # how many config lines per row of keyboard
visual_per_row_counts # Array of length visual_rowcount
visual_prefixes # Array of chars; <= visual_rowcount (miss=SPACE)
prefix_repeat # How many times prefix char is repeated (n/a to SPACE)
in_key_separator # If several layers per row, splits a key-descr
layer_names # Where to put the resulting keys array
Each line consists of prefix (which is ignored except for sanity checking), and whitespace-separated list of key descriptions. (Whitespace followed by a combining character is not separating.) Each key description is split using in_key_separator
into slots, one slot per layout. (The leading in_key_separator
is not separating.) Each key/layout description consists of one or two entries. An entry is either two dashes --
(standing for empty), or a character, or a hex number of length >=4. (A hex numbers must be separated by .
from neighbor word characters.) A loner character which has a different uppercase is auto-replicated in uppercase form. Missing or empty key/layout description gives two empty entries (note that the leading key/layout description cannot be empty; same for "the whole key description" - use the leading --
.
For compatibility with other components, layer names should not contain characters +()[]
.
Inclusion of .klc files
Instead of including a .klc file (or its part) verbatim in a visual section, one can make a section DEADKEYS/NAME/name1/nm2
with a key klc_filename
. Filename will be included and parsed as a DEADKEYS
visual section (with name DEADKEYS/name1/nm2
???). (Currently only UTF-16 files are supported.)
Metadata
A metadata entry is either a string, or an array. A string behaves as if were an array with the string repeated sufficiently many times. Each personality defines MetaData_Index
which chooses the element of the array. The entries
COMPANYNAME LAYOUTNAME COPYR_YEARS LOCALE_NAME LOCALE_ID
DLLNAME SORT_ORDER_ID_ LANGUAGE_NAME
should be defined in the personality section, or above this section in the configuration tree. (Used for output Windows .klc files.)
Optional metadata currently consists only of VERSION
key.
Layer/Face/Prefix-key Recipes
The sections layer_recipes
and face_recipes
contain instructions how to build Layers and Faces out of simpler elements. Similar recipes appear as values of DeadKey_*
entries in a face. Such a "recipe" is executed with parameters: a base face name, a layer number, and a prefix character (the latter is undefined when the recipe is a layer recipe or face recipe). (The recipe is free to ignore the parameters; for example, most recipes ignore the prefix character even when they are "prefix key" recipes.)
The recipes and the visual sections are the most important components of the description of a keyboard group.
To construct layers of a face, a face recipe is executed several times with different "layer number" parameter. In contrast, in simplest cases a layer recipe is executed once. However, when the layer is a part of a compound ("parent") recipe, it inherits the "parameters" from the parent. In particular, it may be executed several times with different face name (if used in different faces), or with different layer number (if used - explicitly or explicitly - in different layer slots; for example, Mutator(LayerName)
in a face/prefix-key recipe will execute the LayerName
recipe separately for all the layer numbers; or one can use Layers(Empty+LayerName)
together with Layers(LayerName+Other)
). Depending on the recipe, these calls may result in the same layout of the resulting layers, or in different layouts.
A recipe may be of three kinds: it is either a "first comer wins" which is a space-separated collection of simpler recipes, or SELECTOR(COMPONENTS)
, or a "mutator": MUTATOR(BASE)
or just MUTATOR
. All recipes must be ()
-balanced and []
-balanced; so must be MUTATOR
; in turn, the BASE
is either a layer name, or another recipe. A layer name must be defined either in a visual KBD
section, or be a key in the layer_recipes
section (so it should not have +()[]
characters), or be the literal Empty
. When MUTATOR(BASE)
is processed, first, the resulting layer(s) of the BASE
recipe are calculated; then the layer(s) are processed by the MUTATOR
(one key at a time).
The most important SELECTOR
keywords are Face
(with argument a face name, defined either via a faces/FACENAME
section, or via face_recipes
) and Layers
(with argument of the form LAYER_NAME+LAYER_NAME+...
, with layer names defined as above). Both select the layer (out of a face, or out of a list) with number equal to the "layer number parameter" in the context of the recipe. The FlipLayers
builder is similar to Face
, but chooses the "other" layer ("cyclically the next" layer if more than 2 are present).
The other selectors are Self
, LinkFace
and FlipLayersLinkFace
; they operate on the base face or face associated to the base face.
The simplest forms of MUTATORS
are Id, lc, uc, Empty
. Recall that a layer is nothing more than a structure associating a pair "unshifted/shifted character" to the key number, and that these characters may be undefined. These simplest mutators modify these characters independently of their key numbers and shift state (with Empty
making all of them undefined). Similar user-defined simple mutators are ByPairs[PAIRS]
; here PAIRS
consists of pairs "FROM TO" of characters (with optional spaces between pairs); "unknown" characters are undefined. (As usual, characters may be replaced by hex numbers with 4 or more hex digits; separate the number from a neighboring word character by .
[dot].)
All mutators must have a form WORD
or WORD[PARAMETERS]
, with PARAMETERS
(),[]
-balanced. Other simple mutators are FlipShift
.................. ......................... Note that Id(LAYERNAME)
is similar to a selector; it is the only way to insert a layer without a selector, since a bareword is interpreted as a MUTATOR
; Id(LAYERNAME)
is a synonym of Layers(LAYERNAME+LAYERNAME+...)
(repeated as many times as there are layers in the parameter "base face").
Mutate
(and its flavors) is the most important mutator.
The recipes in a space-separated list of recipes ("first comer wins") are interpreted independently to give a collection of layers to combine; then, for every key numbers and both shift states, one takes the leftmost recipe which produces a defined character for this position, and the result is put into the resulting layer.
Keep in mind that sometimes to understand what a recipe does, one should trace its description in opposite order: for example, ByPairs[.:](FlipLayers)
creates a layout where :
is at position of .
, but on the second [=other] layer (essentially, if the base layout is the standard one, it binds character :
to keypress AltGr-.
).
To simplify formatting of .kbdd files, a recipe may be an array reference. The string may be split on spaces, or split after comma or |
.
Personalities
A personality NAME
is defined in the section faces/NAME
. (NAME
may include slashes - untested???)
An array layers
gives the list of layers forming the face. (As of version 0.03, only 2 layers are supported.) The string LinkFace
is a face.........
Substitutions
In section Substitutions
one defines composition rules which may be used on par with composition rules extracted from Unicode Character Database. An array FOO
is converted to a hash accessible as <subst-FOO>
from a Diacritic
filter of satellite face processor. An element of the the array must consist of two characters (the first is mapped to the second one). If both characters have upper-case variants, the translation between these variants is also included.
Classification of diacritics
The section Diacritics
contains arrays each describing a class of diacritic marks. Each array may contain up to 7 elements, each consising of diacritic marks in the order of similarity to the "principal" mark o fthe array. Combining characters may be preceded by horizontal space. Elements should contain:
Surrogate chars; 8bit chars; Modifiers
Modifiers below (or above if base char is below)
Vertical (or Comma-like or Doubled or Dotlike or Rotated or letter-like) Modifiers
Prime-like or Centered modifiers
Combining
Combining below (or above if base char is below)
Vertical combining and dotlike Combining
These lists determine what a Diacritic2Self
filter of satellite face processor will produce when followed by whitespace characters (possibly with modifiers) SPACE ENTER TAB BACKSPACE
. (So, if .kbdd file requires this) this determines what diacritic prefix keys produce.
Naming prefix keys
Section DEADKEYS
defines naming of prefix keys. If not named there (or in processed .klc files), the Unicode name of the character will be used.
Keyboards: on ease of access
Let's start with trivialities: different people have different needs with respect to keyboard layouts. For a moment, ignore the question of the repertoir of characters available via keyboard; then the most crucial distinction corresponds to a certain scale. In absense of a better word, we use a provisional name "the required typing speed".
One example of people on the "quick" (or "rabid"?) pole of this scale are people who type a lot of text which is either "already prepared", or for which the "quality of prose" is not crucial. Quite often, these people may type in access of 100 words per minute. For them, the most important questions are of physical exhaustion from typing. The position of most frequent letters relative to the "rest" finger position, whether frequently typed together letters are on different hands (or at least not on the same/adjacent fingers), the distance fingers must travel when typing common words, how many keypresses are needed to reach a letter/symbol which is not "on the face fo the keyboard" - their primary concerns are of this kind.
On the other, "deliberate", pole these concerns cease to be crucial. On this pole are people who type while they "create" the text, and what takes most of their focus is this "creation" process. They may "polish their prose", or the text they write may be overburdened by special symbols - anyway, what they concentrate on is not typing itself.
For them, the details of the keyboard layout are important mostly in the relation to how much they distract the writer from the other things the writer is focused on. The primary question is now not "how easy it is to type this", but "how easy it is to recall how to type this". The focus transfers from the mechanics of finger movements to the psycho/neuro/science of memory.
These questions are again multifaceted: there are symbols one encounters every minute; after you recall once how to access them, most probably you won't need to recall them again - until you have a long interval when you do not type. The situation is quite different with symbols you need once per week - most probably, each time you will need to call them again and again. If such rarely used symbols/letters are frequenct (since many of them appear), it is important to have an easy way to find how to type them; on the other hand, probably there is very little need for this way to be easily memorizable. And for symbols which you need once per day, one needs both an easy way to find how to type them, and the way to type them should better be easily memorizable.
Now add to this the fact that for different people (so: different usage scenarios) this division into "all the time/every minute/every day/every week" categories is going to be different. And one should not forget important scenario of going to vacation: when you return, you need to "reboot" your typing skills from the dormant state.
On the other hand, note that the questions discussed above are more or less orthogonal: if the logic of recollection requires ω to be related in some way to the W-key, then it does not matter where the W-key is on the keyboard - the same logic is applicable to the QWERTY base layoubt, or BÉPO one, or Colemak, or Dvorak. This module concerns itself only with the questions of "consistency" and the related question of "the ease of recall"; we care only about which symbols relate to which "base keys", and do not care about where the base key sit on the physical keyboard.
NOTE: the version 0.01 of this module supports only the standard US layout of the base keys.
Now consider the question of the character repertoir: a person may need ways to type "continuously" in several languages; in addition to this, there may be a need to occasionally type "standalone" characters or symbols outside the repertoir of these languages. Moreover, these languages may use different scripts (such as Polish/Bulgarian/Greek/Arabic/Japanese), or may share a "bulk" of their characters, and differ only in some "exceptional letters". To add insult to injury, these "exceptional letters" may be rare in the language (such as ÿ in French or à in Swedish) or may have a significant letter frequency (such as é in French) or be somewhere in between (such as ñ in Spanish).
And the non-language symbols do not need to be the math symbols (although often they are). An Engish-language discussion of etimology at coffee table may lead to a need to write down a word in polytonic greek, or old norse; next moment one would need to write a phonetic transcription in IPA/APA symbols. A discussion of keyboard layout may involve writing down symbols for non-character keys of the keyboard. A typography freak would optimize a document by fine-tuned whitespaces. Almost everybody needs arrows symbols, and many people would use box drawing characters if they had a simple access to them.
Essentially, this means that as far as it does not impacts other accessibility goals, it makes sense to have unified memorizable access to as many symbols/characters as possible. (An example of impacting other aspects: MicroSoft's (and IBM's) "US International" keyboards steal characters `~'^"
: typing them produces "unexpected results" - they are deadkeys. This significantly simplifies entering characters with accents, but makes it harder to enter non-accented characters.)
One of the most known principles of design of human-machine interaction is that "simple common tasks should be simple to perform, and complicated tasks should be possible to perform". I strongly disagree with this principle - IMO, it lacks a very important component: "a gradual increase in complexity". When a certain way of doing things is easy to perform, and another similar way is still "possible to perform", but on a very elevated level of complexity, this leads to a significant psychological barrier erected between these ways. Even when switching from the first way to the other one has significant benefits, this barrier leads to self-censorship. Essentially, people will ignore the benefits even if they exceed the penalty of "the elevated level of complexity" mentioned above. And IMO self-censorship is the worst type of censorship. (There is a certain similarity between this situation and that of "self-fulfilled prophesies". "People won't want to do this, so I would not make it simpler to do" - and now people do not want to do this...)
So I would add another clause to the law above: "and moderately complicated tasks should remain moderately hard to perform". What does it tell us in the situation of keyboard layout? One can separate several levels of complexity.
- Basic:
-
There should be some "base keyboards": keyboard layouts used for continuous typing in a certain language or script. Access from one base keyboard to letters of another should be as simple as possible.
- By parts:
-
If a symbol can be thought of as a combination of certain symbols accessible on the base keyboard, one should be able to "compose" the symbol: enter it by typing a certain "composition prefix" key then the combination (as far as the combination is unambiguously associated to one symbol).
The "thoughts" above should be either obvious (as in "combining a and e should give æ") or governed by simple mneumonic rules; the rules should cover as wide a range as possible (as in "Greek/Coptic/Hebrew/Russian letters are combined as G/C/H/R and the corresponding Latin letter; the correspondence is phonetic, or, in presence of conflicts, visual").
- Quick access:
-
As many non-basic letters as possible (of those expected to appear often) should be available via shortcuts. Same should be applicable to starting sequences of composition rules (such as "instead of typing
StartCompose
and'
one can typeAltGr-'
). - Smart access
-
Certain non-basic characters may be accessible by shortcuts which are not based on composition rules. However, these shortcuts should be deducible by using simple mneumonic rules (such as "to get a vowel with `-accent, type
AltGr
-key with the physical keyboard's key sitting below the vowel key"). - Superdeath:
-
If everything else fails, the user should be able to enter a character by its Unicode number (preferably in the most frequently referenced format: hexadecimal).
NOTE: This does not seem to be easily achievable, but it looks like a very nifty UI: a certain HotKey is reserved (e.g., AltGr-AppMenu
); when it is tapped, and a character-key is pressed (for example, B) a menu-driven interface pops up where user may navigate to different variants of B, Beta, etc - each of variants with a hotkey to reach NOW, and with instructions how to reach it later from the keyboard without this UI.
Also: if a certain timeout passes after pressing the initial HotKey, an instruction what to do next should appear.
Here are the finer points elaborating on these levels of complexity:
It looks reasonable to allow "fuzzy mneumonic rules": the rules which specify several possible variants where to look for the shortcut (up to 3-4 variants). If/when one forgets the keying of the shortcut, but remembers such a rule, a short experiment with these positions allows one to reconstruct the lost memory.
-
The "base keyboards" (those used for continuous typing in a certain language or script) should be identical to some "standard" widely used keyboards. These keyboards should differ from each other in position of keys used by the scripts only; the "punctuation keys" should be in the same position. If a script B has more letters than a script A, then a lot of "punctuation" on the layout A will be replaced by letters in the layout B. This missing punctuation should be made available by pressing a modifier (AltGr? compare with MicroSoft's Vietnamese keyboard's top row).
-
If more than one base keyboard is used, there must be a quick access: if one needs to enter one letter from layout B when the active layout is A, one should not be forced to switch to B, type the letter, then switch back to A. It must be available on "
Quick_Access_Key letter
". -
One should consider what the
Quick_Access_Key
does when the layouts A and B are identical on a particular key. One can go with the "Occam's razor" approach and makeQuick_Access_Key
the do-nothing identity map. Alternatively, one can make it access some symbols useful both for script A and script B. It is a judgement call.Note that there is a gray area when layouts A and B are not identical, but a key
K
produces punctuation in layout A, and a letter in layout B. Then when in layout B, this punctuation is available onAltGr-key
, so, in principle,Quick_Access_Key
would duplicate the functionality ofAltGr
. Compare with "there is more than one way to do it" below; remember that OS (or misbehaving applications) may make some keypresses "unavailable". I feel that in these situations, having duplication is a significant advantage over "having some extra symbols available". -
Paired symbols (such as such as ≤≥, «», ‹›, “”, ‘’ should be put on paired keyboard's keys: <> or [] or ().
-
"Directional symbols" (such as arrows) should be put either on numeric keypad or on a 3×3 subgrid on the letter-part of the keyboard (such as QWE/ASD/ZXC). (Compare with [broken?] implementation in Neo2.)
-
for symbols that are naturally thought of as sitting in a table, one can create intuitive mapping of quite large tables to the keyboard. Split each key in halves by a horizontal line, think of
Shift-key
as sitting in the top half. Then ignoring`~
key and most of punctuation on the right hand side, keyboard becomes an 8×10 grid. Taking into accountAltGr
, one can map up to 8×10×2 (or, in some cases, 8×20!) table to a keyboard.Example: Think of IPA consonants.
-
Cheatsheets are useful. And there are people who are ready to dedicate a piece of their memory to where on a layout is a particularly useful to them symbol. So even if there is no logical position for a certain symbol, but there is an empty slot on layout, one should not hesitate in using this slot.
However, this will be distractive to people who do not want to dedicate their memory to "special cases". So it makes sense to have three kinds of cheatsheets for layouts: one with special cases ignored (useful for most people), one with all general cases ignored (useful for checks "is this symbol available in some place I do not know about" and for memorization), and one with all the bells and whistles.
-
"There is more than one way to do it" is not a defect, it is an asset. If it is a reasonable expectation to find a symbol X on keypress K', and the same holds for keypress K'' and they both do not conflict with other "being intuitive" goals, go with both variants. Same for 3 variants, 4 - now you get my point.
Example: The standard Russian phonetic layout has Ё on the
^
-key; on the other hand, Ё is a variant of Е; so it makes sense to have Ё available onAltGr-Е
as well. Same for Ъ and Ь. -
Dead keys which are "abstract" (as opposed to being related to letters engraved on physical keyboard) should better be put on modified state of "zombie" keys of the keyboard (
SPACE
,TAB
,CAPSLOCK
,MENU_ACCESS
).NOTE: Making
Shift-Space
a prefix key may lead to usability issues for people used to type CAPITALIZED PHRASES by keepingShift
pressed all the time. As a minimum, the symbols accessed viaShift-SPACE key
should be strikingly different from those produced bykey
so that such problems are noted ASAP. Example: on the first sight, producingNO-BREAK SPACE
onShift-Space Shift-Space
orShift-Space Space
looks like a good idea. Do not do this: the visually undistinguishableNO-BREAK SPACE
would lead to significantly hard-to-debug problems if it was unintentional.
Explanation of keyboard layout terms used in the docs
The aim of this module is to make keyboard layout design as simple as possible. It turns out that even very elaborate designs can be made quickly and the process is not very error-prone. It looks like certain venues not tried before are now made possible; at least I'm not aware of other attempts in this direction. One can make layouts which can be "explained" very concisely, while they contain thousand(s) of accessible letters.
Unfortunately, being on unchartered territories, in my explanations I'm forced to use home-grown terms. So be patient with me... The terms are keyboard group, keyboard, face and layer. (I must compare them with what ISO 9995 does: http://en.wikipedia.org/wiki/ISO/IEC_9995...)
In what follows, the words letter and character are used interchangeably. A key means a physical key on a keyboard clicked (possibly together with one of modifiers Shift
, AltGr
- or, rarely Control
. The key AltGr
is either marked as such, or is just the "right" Alt
key; at least on Windows it can be replaced by Control-Alt
. A prefix key is a key tapping which does not produce any letter, but modifies what the next keypress would do (sometimes it is called a dead key; in ISO 9995
terms, it is probably a latching key).
A plain layer is a part of keyboard layout accessible by using only non-prefix keys (possibly in combination with Shift
); likewise, additional layers are parts of layout accessible by combining the non-prefix keys with Shift
(if needed) and with a particular combination of other modifiers (AltGr
or Control
). So there may be up to 2 additional layers: the AltGr
-layer and Control
-layer.
On the simplest layouts, such as "US" or "Russian", there is no prefix keys - but this is only feasible for languages which use very few characters with diacritic marks. However, note that most layouts do not use Control
-layer - it is stated that this might be subject to problems with system interaction.
The primary face consists of the plain and additional layers of a keyboard; it is the part of layout accessible without switching "sticky state" and without using prefix keys. There may be up to 3 layouts (Primary, AltGr, Control) per face (on Windows). A secondary face is a face exposed after pressing a prefix key.
A personality is a collection of faces: the primary face, plus one face per a defined prefix-key. Finally, a keyboard group is a collection of personalities (switchable by CapsLock and/or personality change hotkeys like Shift-Alt
) designed to work smoothly together.
EXAMPLE: Start with a very elaborate (and not yet implemented, but feasible with this module) example. A keyboard group may consist of phonetically matched Latin and Cyrillic personalities, and visually matched Greek and Math personalities. Several prefix-keys may be shared by all 4 of these personalities; in particular, there would be 4 prefix-keys allowing access to primary faces of these 4 personalities from other personalities of the group. Also, there may be specialised prefix-key tuned for particular need of entering Latin script, Cyrillic script, Greek script, and Math.
Suppose that there are 8 specialized Latin prefix-keys (for example, name them
grave/tilde/hat/breve/ring_above/macron/acute/diaeresis
although in practice each one of them may do more than the name suggests). Then Latin personality will have the following 13 faces:
Primary/Latin-Primary/Cyrillic-Primary/Greek-Primary/Math-Primary
grave/tilde/hat/breve/ring_above/macron/acute/diaeresis
NOTE: Here Latin-Primary is the face one gets when one presses the Access-Latin prefix-key when in Latin mode; it may be convenient to define it to be the same as Primary - or maybe not. For example, if one defines it to be Greek-Primary, then this prefix-key has a convenient semantic of flipping between Latin and Greek modes for the next typed character: when in Latin, Latin-PREFIX-KEY a
would enter α, when in Greek, the same keypresses [now meaning "Latin-PREFIX-KEY α"] would enter "a".
Assume that the layout does not use the Control
modifier. Then each of these faces would consists of two layers: the plain one, and the AltGr
- one. For example, pressing AltGr
with a key on Greek face could add diaeresis to a vowel, or use a modified ("final" or "symbol") "glyph" for a consonant (as in σ/ς θ/ϑ). Or, on Latin face, AltGr-a
may produce æ. Or, on a Cyrillic personality, AltGr-я (ya) may produce ѣ (yat').
Likewise, the Greek personality may define special prefix-keys to access polytonic greek vowels. (On the other hand, maybe this is not a very good idea - it may be more useful to make polytonic Greek accessible from all personalities in a keyboard group. Then one is able to type a polytonic Greek letter without switching to the Greek personality.)
With such a keyboard group, to type one Greek word in a Cyrillic text one would switch to the Greek personality, then back to Cyrillic; but when all one need to type now is only one Greek letter, it may be easier to use the "Greek-PREFIX-KEY letter" combination, and save switching back to the Cyrillic personality. (Of course, for this to work the letter should be on the primary face of the Greek personality.)
=====================================================
Looks too complicated? Try to think about it in a different way: there are many faces in a keyboard group; break them into 3 "onion rings":
- CORE faces
-
one can "switch to a such a face" and type continuously using this face without pressing prefix keys. In other words, these faces can be made "active".
When another face is active, the letters in these faces are still accessible by pressing one particular prefix key before each of these letters. This prefix key does not depend on which core face is currently "active". (This is the same as for univerally accessible faces.)
- Universally accessible faces
-
one cannot "switch to them", however, letters in these faces are accessible by pressing one particular prefix key before this letter. This prefix key does not depend on which core face is currently "active".
- satellite faces
-
one cannot "switch to them", and letters in these faces are accessible from one particular core face only. One must press a prefix key before every letter in such faces.
For example, when entering a mix of Latin/Cyrillic scripts and math, it makes sense to make the base-Latin and base-Cyrillic faces into the core; it is convenient when (several) Math faces and a Greek face can be made universally accessible. On the other hand, faces containing diacritized Latin letters and diacritized Cyrillic letters should better be made satellite; this avoids a proliferation of prefix keys which would make typing slower.
Access to diacritic marks
The logic: prefix keys are either 8-bit characters with high bit set, or if none with the needed glyph, they are "spacing modifier letters" or "spacing clones of diacritics". And if you type something after them, you can get other modifier letters and combining characters: here is the logic of this:
- The second press
-
The principal combining mark.
- Surrogate for the diacritic
-
(either
"
or'
): corresponding "prime shape"-modifier character - SPACE
-
The modifier character itself.
- NBSP
-
Modifier letter (the first one if diacritic is 8-bit, the second one otherwise.
Some stats on prefix keys: ISO 9995-3
uses 26 prefix keys for diacritics; bépo uses 20, while EurKey uses 8. On the other end of spectrum, there are 10 US keyboard keys with "calculatable" relation to Latin diacritics:
`~^-'",./? --- grave/tilde/hat/macron/acute/diaeresis/cedilla/dot/stroke/hook-above
To this list one may add a "calculatable" key $
as the currency prefix; on the other hand, one should probably remove ?
since AltGr-?
should better be "set in stone" to denote ¿
. If one adds Greek, then the calculatable positions for aspiration are on [ ]
(or on ( )
). Of widely used Latin diacritics, this leaves ring/hacek/breve/horn/ogonek/comma (and doubled grave/acute).
CAVEATS for BÉPO keyboard:
Non-US keycaps: the key "a" still uses (VK_)A, but its scancode is now different. E.g., French's A is on 0x10, which is US's Q. Our table of scancodes is currently hardwired. Some pictures and tables are available on
http://bepo.fr/wiki/Pilote_Windows
FILES
Useful tidbits from Unicode mailing list (unsorted)
.... skew-orthogonal complement
Drachma: http://unicode.org/mail-arch/unicode-ml/y2012-m05/0167.html
http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3866.pdf
Pound
http://unicode.org/mail-arch/unicode-ml/y2012-m05/0242.html
MS keyboard (wrong?)
http://unicode.org/mail-arch/unicode-ml/y2012-m05/0268.html
w-ring is a stowaway
http://unicode.org/mail-arch/unicode-ml/y2012-m04/0043.html
History of squared pH
http://unicode.org/mail-arch/unicode-ml/y2012-m02/0123.html
Why and how to introduce innovative characters
http://unicode.org/mail-arch/unicode-ml/y2012-m01/0045.html
Upside-down text in CSS (remove position?)
http://unicode.org/mail-arch/unicode-ml/y2012-m01/0037.html
Classification of Dings (bats etc)
std.dkuug.dk/jtc1/sc2/wg2/docs/n4115.pdf
Escape: 2be9 2b9b
ARROW SHAFT - various
Math Almost-Text encoding
http://unicode.org/notes/tn28/UTN28-PlainTextMath-v3.pdf
http://unicode.org/mail-arch/unicode-ml/y2011-m10/0018.html
For me 1/2/3/4 means unambiguously ((1/2)/3)/4, i.e. 1/(2*3*4)
Unicode mostly encodes characters that are in use or have been
encoded in other standards. While not semantically agnostic, it is
much less oriented towards semantic clarifications and
distinctions than many people might hope for (and this includes
me, some of the time at least).
Unicode knows the concept of a provisional property
http://unicode.org/mail-arch/unicode-ml/y2011-m11/0142.html
http://unicode.org/reports/tr23/
http://unicode.org/mail-arch/unicode-ml/y2011-m11/0161.html
If you want to make analogies, however, the ISO ballots constitute
the *provisional* publication for character code points and names.
that needs to be available from day one for a character to be
implementable at all (such as decomp mappings, bidi class,
code point, name, etc.).
ZERO-WIDTH UNDEFINED DECOMPOSITION MARK
- to define decomposition, prepend it
Yiddish digraphs
http://unicode.org/mail-arch/unicode-ml/y2011-m10/0121.html
Locales
http://blog.kyero.com/2011/11/14/what-is-the-common-locale-data-repository/
http://blog.kyero.com/2010/12/02/lost-in-translation-locales-not-languages/
http://unicode.org/mail-arch/unicode-ml/y2006-m06/0203.html
Silly quotation marks: 201b, 201f
http://en.wikipedia.org/wiki/Quotation_mark_glyphs
http://unicode.org/mail-arch/unicode-ml/y2006-m06/0300.html
http://unicode.org/mail-arch/unicode-ml/y2006-m06/0317.html
http://en.wikipedia.org/wiki/Comma
http://en.wikipedia.org/wiki/%CA%BBOkina
http://en.wikipedia.org/wiki/Saltillo_%28linguistics%29
http://unicode.org/mail-arch/unicode-ml/y2006-m06/0367.html
http://unicode.org/unicode/reports/tr8/
under "4.6 Apostrophe Semantics Errata"
COMBINING GREEK YPOGEGRAMMENI equilibristic (depends on a vowel?)
http://unicode.org/mail-arch/unicode-ml/y2006-m06/0299.html
http://unicode.org/mail-arch/unicode-ml/y2006-m06/0308.html
http://www.tlg.uci.edu/~opoudjis/unicode/unicode_adscript.html
http://unicode.org/mail-arch/unicode-ml/y2008-m05/0046.html
General
http://ebixio.com/online_docs/UnicodeDemystified.pdf
Keyboard keys:
http://unicode.org/mail-arch/unicode-ml/Archives-Old/UML009/0204.html
Horizontal/vertical line/arrow extensions
http://unicode.org/charts/PDF/U2300.pdf
http://unicode.org/mail-arch/unicode-ml/y2003-m07/0513.html
http://std.dkuug.dk/JTC1/SC2/WG2/docs/n2508.htm
Cyrillic Script, Unicode status (+combining)
http://scriptsource.org/cms/scripts/page.php?item_id=entry_detail&uid=ngc339csy8
http://scriptsource.org/cms/scripts/page.php?item_id=entry_detail&uid=ktxptbccph
OHM: In modern usage, for new documents, this character should not be used.
http://unicode.org/mail-arch/unicode-ml/y2011-m08/0060.html
Substitute blank
http://unicode.org/mail-arch/unicode-ml/y2011-m07/0101.html
Representing invisible characters
http://unicode.org/mail-arch/unicode-ml/y2011-m07/0094.html
Diacritics in fonts
http://unicode.org/mail-arch/unicode-ml/y2011-m05/0047.html
http://www.user.uni-hannover.de/nhtcapri/combining-marks.html#greek
Unicode in 1889
http://www.archive.org/stream/unicodeuniversa00unkngoog#page/n3/mode/2up
On the other hand, having access to text only math symbols makes it possible to implement it in computer languages, making source code easier to read.
Right now, I feel there is a lack of keyboard maps. You can develop them on your own, but that is very time consuming.
http://unicode.org/mail-arch/unicode-ml/y2011-m04/0117.html
Licences (GPL etc) in TV sets
http://unicode.org/mail-arch/unicode-ml/y2009-m12/0092.html
Exciting new letter forms for English
http://www.theonion.com/articles/alphabet-updated-with-15-exciting-new-replacement,2869/
Similar glyphs:
http://unicode.org/reports/tr39/data/confusables.txt
Hyphens:
http://unicode.org/mail-arch/unicode-ml/y2009-m10/0038.html
GOST 10859
http://unicode.org/mail-arch/unicode-ml/y2009-m09/0082.html
http://www.mailcom.com/besm6/ACPU-128.jpg
Unicode to PostScript
http://unicode.org/mail-arch/unicode-ml/y2009-m06/0056.html
http://www.linuxfromscratch.org/blfs/view/svn/pst/enscript.html
http://unicode.org/mail-arch/unicode-ml/y2009-m06/0062.html
Linguists mailing lists
http://unicode.org/mail-arch/unicode-ml/y2009-m06/0066.html
GeoLocation by IP
http://unicode.org/mail-arch/unicode-ml/y2009-m04/0197.html
Per language character repertoir:
http://unicode.org/mail-arch/unicode-ml/y2009-m04/0253.html
http://unicode.org/mail-arch/unicode-ml/y2009-m04/0255.html
Compromizes vs reality
http://unicode.org/mail-arch/unicode-ml/y2010-m02/0106.html
http://unicode.org/mail-arch/unicode-ml/y2010-m02/0117.html
Dates/numbers in Unicode
http://unicode.org/mail-arch/unicode-ml/y2010-m02/0122.html
Normalization FAQ
http://www.macchiato.com/unicode/nfc-faq
Hebrew char input
http://rishida.net/scripts/pickers/hebrew/
http://rishida.net/scripts/uniview/#title
Obsolete IPA
http://unicode.org/mail-arch/unicode-ml/y2009-m01/0487.html
Teutonista (vowel guide p11, kbd p13)
http://www.sprachatlas.phil.uni-erlangen.de/materialien/Teuthonista_Handbuch.pdf
Greek letters for non-Greek
http://stephanus.tlg.uci.edu/~opoudjis/unicode/unicode_interloping.html#ipa
Pretty-printing text math
http://code.google.com/p/sympy/wiki/PrettyPrinting
Sub/Super on a terminal
http://unicode.org/mail-arch/unicode-ml/y2008-m07/0028.html
Apostrophe
http://unicode.org/mail-arch/unicode-ml/y2008-m05/0060.html
http://unicode.org/mail-arch/unicode-ml/y2008-m05/0063.html
http://unicode.org/mail-arch/unicode-ml/y2008-m05/0066.html
http://unicode.org/mail-arch/unicode-ml/y2007-m07/0251.html
http://unicode.org/mail-arch/unicode-ml/y2007-m05/0309.html
Uppercase eszett ß ẞ
http://unicode.org/mail-arch/unicode-ml/y2007-m05/0007.html
http://unicode.org/mail-arch/unicode-ml/y2007-m05/0008.html
http://unicode.org/mail-arch/unicode-ml/y2007-m05/0142.html
http://unicode.org/mail-arch/unicode-ml/y2007-m05/0045.html
http://unicode.org/mail-arch/unicode-ml/y2007-m05/0147.html
http://unicode.org/mail-arch/unicode-ml/y2007-m05/0170.html
http://unicode.org/mail-arch/unicode-ml/y2007-m05/0196.html
Questionner at start of Unicode proposal
http://unicode.org/mail-arch/unicode-ml/y2007-m05/0087.html
Rubi
http://en.wikipedia.org/wiki/Ruby_character#Unicode
Cyrillic soup
http://czyborra.com/charsets/cyrillic.html
Glottals
http://unicode.org/mail-arch/unicode-ml/y2008-m05/0151.html
http://unicode.org/mail-arch/unicode-ml/y2008-m05/0163.html
http://unicode.org/mail-arch/unicode-ml/y2008-m05/0202.html
http://unicode.org/mail-arch/unicode-ml/y2008-m05/0205.html
Tamil/ISCII
http://unicode.org/faq/indic.html
http://unicode.org/versions/Unicode6.1.0/ch09.pdf
CGI and OpenType
http://unicode.org/mail-arch/unicode-ml/y2008-m02/0097.html
Numbers in scripts ;-)
http://unicode.org/mail-arch/unicode-ml/y2008-m02/0120.html
Indicating coverage of the font
http://unicode.org/mail-arch/unicode-ml/y2008-m02/0152.html
http://unicode.org/mail-arch/unicode-ml/y2008-m02/0167.html
Proposing new stuff
http://unicode.org/mail-arch/unicode-ml/y2008-m01/0238.html
NOT and BROKEN BAR
http://unicode.org/mail-arch/unicode-ml/y2007-m12/0207.html
http://www.cs.tut.fi/~jkorpela/latin1/ascii-hist.html#5C
Accessing ligatures
http://unicode.org/mail-arch/unicode-ml/y2007-m11/0210.html
Should not use (roman numerals)
http://unicode.org/mail-arch/unicode-ml/y2007-m11/0253.html
Folding characters
http://unicode.org/reports/tr30/tr30-4.html
Ignorable glyphs
http://unicode.org/mail-arch/unicode-ml/y2007-m08/0132.html
http://unicode.org/mail-arch/unicode-ml/y2007-m08/0138.html
http://unicode.org/mail-arch/unicode-ml/y2007-m08/0120.html
Spacing: English and French
http://unicode.org/mail-arch/unicode-ml/y2006-m09/0167.html
http://unicode.org/mail-arch/unicode-ml/y2008-m05/0103.html
http://unicode.org/mail-arch/unicode-ml/y2007-m08/0138.html
HOWTO: (non)dummy VS in fonts
http://unicode.org/mail-arch/unicode-ml/y2007-m08/0118.html
OXIA vs TONOS
http://www.tlg.uci.edu/~opoudjis/unicode/unicode_gkbkgd.html#oxia
ZWSP ZWNJ WJ SHY NON-BREAKING HYPHEN
http://unicode.org/mail-arch/unicode-ml/y2007-m08/0123.html
http://unicode.org/mail-arch/unicode-ml/y2007-m07/0188.html
http://unicode.org/mail-arch/unicode-ml/y2007-m07/0199.html
http://unicode.org/mail-arch/unicode-ml/y2007-m07/0201.html
http://unicode.org/mail-arch/unicode-ml/y2007-m06/0122.html
http://unicode.org/mail-arch/unicode-ml/y2007-m01/0297.html
On which base to draw a "standalone" diacretic
http://unicode.org/mail-arch/unicode-ml/y2007-m07/0075.html
Universality vs affordability
http://unicode.org/mail-arch/unicode-ml/y2007-m07/0157.html
The IBM 1401 Hebrew Letter Key
http://www.qsm.co.il/Hebrew/HebKey.htm
Structure of development of Unicode
http://unicode.org/mail-arch/unicode-ml/y2006-m07/0056.html
http://unicode.org/mail-arch/unicode-ml/y2005-m07/0099.html
I don't have a problem with Unicode. It is what it is; it cannot
possibly be all things to all people:
http://unicode.org/mail-arch/unicode-ml/y2005-m07/0101.html
CR symbols
http://unicode.org/mail-arch/unicode-ml/y2006-m07/0163.html
Chicago Manual of Style
http://unicode.org/mail-arch/unicode-ml/y2006-m01/0127.html
Stability of normalization
http://unicode.org/mail-arch/unicode-ml/y2005-m07/0055.html
Writing systems vs written languages
http://unicode.org/mail-arch/unicode-ml/y2005-m07/0198.html
http://unicode.org/mail-arch/unicode-ml/y2005-m07/0241.html
MS Visual OpenType tables
http://www.microsoft.com/typography/VOLT.mspx
http://www.microsoft.com/typography
Coloring parts of ligatures Implemenations:
http://unicode.org/mail-arch/unicode-ml/y2005-m06/0195.html
http://unicode.org/mail-arch/unicode-ml/y2005-m06/0233.html
http://unicode.org/mail-arch/unicode-ml/y2005-m06/0208.html
GPOS
http://unicode.org/mail-arch/unicode-ml/y2005-m06/0167.html
Combining power of generative features - implementor's view
http://unicode.org/mail-arch/unicode-ml/y2004-m09/0145.html
"Same" character Oacute used for different "functions" in the same text
http://unicode.org/mail-arch/unicode-ml/y2004-m08/0019.html
etc:
http://unicode.org/mail-arch/unicode-ml/y2004-m07/0227.html
Diacritics
http://www.sil.org/~gaultney/ProbsOfDiacDesignLowRes.pdf
http://en.wikipedia.org/wiki/Sylfaen_%28typeface%29
http://tiro.com/Articles/sylfaen_article.pdf
Variation sequences
http://unicode.org/mail-arch/unicode-ml/y2004-m07/0246.html
Federal vs regional aspects of Latinization (a lot of flak; cp1251)
http://peoples.org.ru/stenogramma.html
Sign writing
http://std.dkuug.dk/jtc1/sc2/wg2/docs/n4342.pdf
Writing digits in non-decimal
http://unicode.org/mail-arch/unicode-ml/y2011-m03/0050.html
Which separator is less ambiguous? Breve ˘ ? ␣ ? Inverted ␣ ?
Colors in Unicode names
http://unicode.org/mail-arch/unicode-ml/y2011-m03/0100.html
Use to identify a letter:
http://unicode.org/charts/collation/
A useful set of criteria for encoding symbols is found in Annex H of this document:
http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3002.pdf
What is a "Latin" char
http://unicode.org/forum/viewtopic.php?f=23&t=102
Perl has problems with unpaired surrogates (whole thread)
http://unicode.org/mail-arch/unicode-ml/y2010-m11/0034.html
Complex fonts (e.g., Indic)
http://unicode.org/mail-arch/unicode-ml/y2010-m10/0049.html
Complex glyphs in Symbola (pre-6.01) font may crash older versions of Windows
http://unicode.org/mail-arch/unicode-ml/y2010-m10/0082.html
http://unicode.org/mail-arch/unicode-ml/y2010-m10/0084.html
Window 7 SP1 improvements
http://babelstone.blogspot.de/2010/05/prototyping-tangut-imes-or-why-windows.html
Middle dot is ambiguous
http://unicode.org/mail-arch/unicode-ml/y2010-m09/0023.html
Apostroph as soft sign
http://unicode.org/mail-arch/unicode-ml/y2010-m08/0123.html
Chinese typesetting
http://idsgn.org/posts/the-end-of-movable-type-in-china/
Keyboards - agreement (5 scripts at end)
ftp://ftp.cen.eu/CEN/Sectors/List/ICT/CWAs/CWA-16108-2010-MEEK.pdf
LAMBDA vs LAMDA
http://unicode.org/mail-arch/unicode-ml/y2010-m06/0063.html
U+01BE LATIN LETTER INVERTED GLOTTAL STOP WITH STROKE; oi etc
http://unicode.org/notes/tn27/
Superscript == modifiers
http://unicode.org/mail-arch/unicode-ml/y2010-m03/0133.html
Need for a keyboard, keyman examples; why "standard" keyboards are doomed
http://unicode.org/mail-arch/unicode-ml/y2010-m01/0015.html
http://unicode.org/mail-arch/unicode-ml/y2010-m01/0022.html
http://unicode.org/mail-arch/unicode-ml/y2010-m01/0036.html
http://unicode.org/mail-arch/unicode-ml/y2010-m01/0053.html
@fonts and non-URL URIs
http://unicode.org/mail-arch/unicode-ml/y2010-m01/0156.html
How to encode Latin-in-fraktur
http://unicode.org/mail-arch/unicode-ml/y2007-m01/0279.html
http://unicode.org/mail-arch/unicode-ml/y2007-m01/0263.html
Math layout
http://unicode.org/mail-arch/unicode-ml/y2007-m01/0303.html
Book Spine reading direction
http://www.artlebedev.com/mandership/122/
Xerox and interrobang
http://unicode.org/mail-arch/unicode-ml/y2005-m04/0035.html
Translation of Unicode names
http://unicode.org/mail-arch/unicode-ml/y2012-m12/0066.html
http://unicode.org/mail-arch/unicode-ml/y2012-m12/0076.html
SEE ALSO
The keyboard(s) generated with this module: UI::KeyboardLayout::izKeys, http://k.ilyaz.org/
On diacritics:
http://www.phon.ucl.ac.uk/home/wells/dia/diacritics-revised.htm#two
http://en.wikipedia.org/wiki/Tonos#Unicode
http://en.wikipedia.org/wiki/Early_Cyrillic_alphabet#Numerals.2C_diacritics_and_punctuation
http://en.wikipedia.org/wiki/Vietnamese_alphabet#Tone_marks
http://diacritics.typo.cz/
http://en.wikipedia.org/wiki/User:TEB728/temp (Chars of languages)
http://www.evertype.com/alphabets/index.html
Accents in different Languages:
http://fonty.pl/porady,12,inne_diakrytyki.htm#07
http://en.wikipedia.org/wiki/Latin-derived_alphabet
Typesetting Old and Modern Church Slavonic
http://www.sanu.ac.rs/Cirilica/Prilozi/Skup.pdf
http://irmologion.ru/ucsenc/ucslay8.html
http://irmologion.ru/csscript/csscript.html
http://cslav.org/success.htm
http://irmologion.ru/developer/fontdev.html#allocating
On typography marks
http://wiki.neo-layout.org/wiki/Striche
http://www.matthias-kammerer.de/SonsTypo3.htm
http://en.wikipedia.org/wiki/Soft_hyphen
http://en.wikipedia.org/wiki/Dash
http://en.wikipedia.org/wiki/Ditto_mark
On keyboard layouts:
http://en.wikipedia.org/wiki/Keyboard_layout
http://en.wikipedia.org/wiki/Keyboard_layout#US-International
http://en.wikipedia.org/wiki/ISO/IEC_9995
http://www.pentzlin.com/info2-9995-3-V3.pdf (used almost nowhere - only half of keys in Canadian multilanguage match)
Discussion of layout changes and position of €:
https://www.libreoffice.org/bugzilla/show_bug.cgi?id=5981
http://msdn.microsoft.com/en-us/goglobal/bb964651
http://eurkey.steffen.bruentjen.eu/layout.html
http://ru.wikipedia.org/wiki/%D0%A4%D0%B0%D0%B9%D0%BB:Birman%27s_keyboard_layout.svg
http://bepo.fr/wiki/Accueil
http://cgit.freedesktop.org/xkeyboard-config/tree/symbols/ru
http://cgit.freedesktop.org/xkeyboard-config/tree/symbols/keypad
http://www.evertype.com/celtscript/type-keys.html (Old Irish mechanical typewriters)
http://eklhad.net/linux/app/halfqwerty.xkb (One-handed layout)
http://www.doink.ch/an-x11-keyboard-layout-for-scholars-of-old-germanic/ (and references there)
http://www.neo-layout.org/
https://commons.wikimedia.org/wiki/File:Neo2_keyboard_layout.svg
Images in (download of)
http://www.mzuther.de/en/contents/osd-neo2
Neo2 sources:
http://wiki.neo-layout.org/browser/windows/kbdneo2/Quelldateien
Shift keys at center, nice graphic:
http://www.tinkerwithabandon.com/twa/keyboarding.html
Physical keyboard:
http://www.konyin.com/?page=product.Multilingual%20Keyboard%20for%20UNITED%20STATES
Portable keyboard layout
http://www.autohotkey.com/forum/viewtopic.php?t=28447
One-handed
http://www.autohotkey.com/forum/topic1326.html
Typing on numeric keypad
http://goron.de/~johns/one-hand/#documentation
On screen keyboard indicator
http://www.autohotkey.com/docs/scripts/KeyboardOnScreen.htm
Phonetic Hebrew layout(s) (1st has many duplicates, 2nd overweighted)
http://bc.tech.coop/Hebrew-ZC.html
http://help.keymanweb.com/keyboards/keyboard_galaxiehebrewkm6.php
Greek (Galaxy) with a convenient mapping (except for Ψ) and BibleScript
http://www.tavultesoft.com/keyboarddownloads/%7B4D179548-1215-4167-8EF7-7F42B9B0C2A6%7D/manual.pdf
With 2-letter input of Unicode names:
http://www.jlg-utilities.com
Medievist's
http://www.personal.leeds.ac.uk/~ecl6tam/
By author of MSKLC Michael S. Kaplan (do not forget to follow links)
http://blogs.msdn.com/b/michkap/archive/2006/03/26/560595.aspx
http://blogs.msdn.com/b/michkap/archive/2006/04/22/581107.aspx
Chaining dead keys:
http://blogs.msdn.com/b/michkap/archive/2011/04/16/10154700.aspx
Mapping VK to VSC etc:
http://blogs.msdn.com/b/michkap/archive/2006/08/29/729476.aspx
[Link] Remapping CapsLock to mean Backspace in a keyboard layout
(if repeat, every second Press counts ;-)
http://colemak.com/forum/viewtopic.php?id=870
Scancodes from kbd.h get in the way
http://blogs.msdn.com/b/michkap/archive/2006/08/30/726087.aspx
What happens if you start with .klc with other VK_ mappings:
http://blogs.msdn.com/b/michkap/archive/2010/11/03/10085336.aspx
Keyboards with Ctrl-Shift states:
http://blogs.msdn.com/b/michkap/archive/2010/10/08/10073124.aspx
On assigning Ctrl-values
http://blogs.msdn.com/b/michkap/archive/2008/11/04/9037027.aspx
On hotkeys for switching layouts:
http://blogs.msdn.com/b/michkap/archive/2008/07/16/8736898.aspx
Text services
http://blogs.msdn.com/b/michkap/archive/2008/06/30/8669123.aspx
Low-level access in MSKLC
http://levicki.net/articles/tips/2006/09/29/HOWTO_Build_keyboard_layouts_for_Windows_x64.php
http://blogs.msdn.com/b/michkap/archive/2011/04/09/10151666.aspx
On font linking
http://blogs.msdn.com/b/michkap/archive/2006/01/22/515864.aspx
Unicode in console
http://blogs.msdn.com/michkap/archive/2005/12/15/504092.aspx
Adding formerly "invisible" keys to the keyboard
http://blogs.msdn.com/b/michkap/archive/2006/09/26/771554.aspx
Redefining NumKeypad keys
http://blogs.msdn.com/b/michkap/archive/2007/07/04/3690200.aspx
BUT!!!
http://blogs.msdn.com/b/michkap/archive/2010/04/05/9988581.aspx
And backspace/return/etc
http://blogs.msdn.com/b/michkap/archive/2008/10/27/9018025.aspx
kbdutool.exe, run with the /S ==> .c files
Doing one's own WM_DEADKEY processing'
http://blogs.msdn.com/b/michkap/archive/2006/09/10/748775.aspx
Dead keys do not work on SG-Caps
http://blogs.msdn.com/b/michkap/archive/2008/02/09/7564967.aspx
Dynamic keycaps keyboard
http://blogs.msdn.com/b/michkap/archive/2005/07/20/441227.aspx
Backslash/yen/won confusion
http://blogs.msdn.com/b/michkap/archive/2005/09/17/469941.aspx
Unicode output to console
http://blogs.msdn.com/b/michkap/archive/2010/10/07/10072032.aspx
Install/Load/Activate an input method/layout
http://blogs.msdn.com/b/michkap/archive/2007/12/01/6631463.aspx
http://blogs.msdn.com/b/michkap/archive/2008/05/23/8537281.aspx
Reset to a TT font from an application:
http://blogs.msdn.com/b/michkap/archive/2011/09/22/10215125.aspx
How to (not) treat C-A-Q
http://blogs.msdn.com/b/michkap/archive/2012/04/26/10297903.aspx
Treating Brazilian ABNT c1 c2 keys
http://blogs.msdn.com/b/michkap/archive/2006/10/07/799605.aspx
And JIS ¥|-key
(compare with http://www.scs.stanford.edu/11wi-cs140/pintos/specs/kbd/scancodes-7.html
http://hp.vector.co.jp/authors/VA003720/lpproj/others/kbdjpn.htm )
http://blogs.msdn.com/b/michkap/archive/2006/09/26/771554.aspx
Suggest a topic:
http://blogs.msdn.com/b/michkap/archive/2007/07/29/4120528.aspx#7119166
VK_OEM_8 Kana modifier - Using instead of AltGr http://www.kbdedit.com/manual/ex13_replacing_altgr_with_kana.html Limitations of using KANA toggle http://www.kbdedit.com/manual/ex12_trilang_ser_cyr_lat_gre.html
FE (Far Eastern) keyboard source code example: http://read.pudn.com/downloads3/sourcecode/windows/248345/win2k/private/ntos/w32/ntuser/kbd/fe_kbds/jpn/ibm02/kbdibm02.c__.htm
Investigation on relation between VK_ asignments, KBDEXT, KBDNUMPAD etc:
http://code.google.com/p/ergo-dvorak-for-developers/source/browse/trunk/kbddvp.c
PowerShell vs ISE
http://blogs.msdn.com/b/powershell/archive/2009/04/17/differences-between-the-ise-and-powershell-console.aspx
HTML consolidated entity names and discussion, MES charsets:
http://www.w3.org/TR/xml-entity-names
http://www.w3.org/2003/entities/2007/w3centities-f.ent
http://www.cl.cam.ac.uk/~mgk25/ucs/mes-2-rationale.html
http://web.archive.org/web/20000815100817/http://www.egt.ie/standards/iso10646/pdf/cwa13873.pdf
Ctrl2cap
http://technet.microsoft.com/en-us/sysinternals/bb897578
Low level scancode mapping
http://www.annoyances.org/exec/forum/winxp/r1017256194
http://web.archive.org/web/20030211001441/http://www.microsoft.com/hwdev/tech/input/w2kscan-map.asp
http://msdn.microsoft.com/en-us/windows/hardware/gg463447
http://www.annoyances.org/exec/forum/winxp/1034644655
???
http://netj.org/2004/07/windows_keymap
the free remapkey.exe utility that's in Microsoft NT / 2000 resource kit.
perl -wlne "BEGIN{$t = {T => q(), qw( X e0 Y e1 )}} print qq( $t->{$1}$2\t$3) if /^#define\s+([TXY])([0-9a-f]{2})\s+(?:_EQ|_NE)\((?:(?:\s*\w+\s*,){3})?\s*([^\W_]\w*)\s*(?:(?:,\s*\w+\s*){2})?\)\s*(?:\/\/.*)?$/i" kbd.h >ll2
then select stuff up to the first e1 key (but DECIMAL is not there T53 is DELETE??? take from MSKLC help/using/advanced/scancodes)
CapsLock as on typewriter:
http://www.annoyances.org/exec/forum/winxp/1071197341
Problems on X11:
http://www.x.org/releases/X11R7.6/doc/kbproto/xkbproto.html (definition of XKB???)
http://wiki.linuxquestions.org/wiki/Configuring_keyboards (current???)
http://wiki.linuxquestions.org/wiki/Accented_Characters (current???)
http://wiki.linuxquestions.org/wiki/Altering_or_Creating_Keyboard_Maps (current???)
https://help.ubuntu.com/community/ComposeKey (documents almost 1/2 of the needed stuff)
http://www.gentoo.org/doc/en/utf-8.xml (2005++ ???)
http://en.gentoo-wiki.com/wiki/X.Org/Input_drivers (2009++ HAS: How to make CapsLock change layouts)
http://www.freebsd.org/cgi/man.cgi?query=setxkbmap&sektion=1&manpath=X11R7.4
http://people.uleth.ca/~daniel.odonnell/Blog/custom-keyboard-in-linuxx11
http://shtrom.ssji.net/skb/xorg-ligatures.html (of 2008???)
http://tldp.org/HOWTO/Danish-HOWTO-2.html (of 2005???)
http://www.tux.org/~balsa/linux/deadkeys/index.html (of 1999???)
http://www.x.org/releases/X11R7.6/doc/libX11/Compose/en_US.UTF-8.html
http://cgit.freedesktop.org/xorg/proto/xproto/plain/keysymdef.h
EIGHT_LEVEL FOUR_LEVEL_ALPHABETIC FOUR_LEVEL_SEMIALPHABETIC PC_SYSRQ : see
http://cafbit.com/resource/mackeyboard/mackeyboard.xkb
./xkb in /etc/X11 /usr/local/X11 /usr/share/local/X11 /usr/share/X11
(maybe it is more productive to try
ls -d /*/*/xkb /*/*/*/xkb
?)
but what dead_diaresis means is defined here:
Apparently, may be in /usr/X11R6/lib/X11/locale/en_US.UTF-8/Compose /usr/share/X11/locale/en_US.UTF-8/Compose
http://wiki.maemo.org/Remapping_keyboard
http://www.x.org/releases/current/doc/man/man8/mkcomposecache.8.xhtml
Note: have XIM input method in GTK disables Control-Shift-u way of entering HEX unicode.
How to contribute:
http://www.freedesktop.org/wiki/Software/XKeyboardConfig/Rules
Note: the problems with handling deadkeys via .Compose are that: .Compose is handled by applications, while keymaps by server (since they may be on different machines, things can easily get out of sync); .Compose knows nothing about the current "Keyboard group" or of the state of CapsLock etc (therefore emulating "group switch" via composing is impossible).
JS code to add "insert these chars": google for editpage_specialchars_cyrilic, or
http://en.wikipedia.org/wiki/User:TEB728/monobook.jsx
Latin paleography
http://en.wikipedia.org/wiki/Latin_alphabet
http://tlt.its.psu.edu/suggestions/international/bylanguage/oenglish.html
http://guindo.pntic.mec.es/~jmag0042/LATIN_PALEOGRAPHY.pdf
http://www.evertype.com/standards/wynnyogh/ezhyogh.html
http://www.wordorigins.org/downloads/OELetters.doc
http://www.menota.uio.no/menota-entities.txt
http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2957.pdf (Uncomplete???)
http://skaldic.arts.usyd.edu.au/db.php?table=mufi_char&if=mufi (No prioritization...)
Summary tables for Cyrillic
http://ru.wikipedia.org/wiki/%D0%9A%D0%B8%D1%80%D0%B8%D0%BB%D0%BB%D0%B8%D1%86%D0%B0#.D0.A1.D0.BE.D0.B2.D1.80.D0.B5.D0.BC.D0.B5.D0.BD.D0.BD.D1.8B.D0.B5_.D0.BA.D0.B8.D1.80.D0.B8.D0.BB.D0.BB.D0.B8.D1.87.D0.B5.D1.81.D0.BA.D0.B8.D0.B5_.D0.B0.D0.BB.D1.84.D0.B0.D0.B2.D0.B8.D1.82.D1.8B_.D1.81.D0.BB.D0.B0.D0.B2.D1.8F.D0.BD.D1.81.D0.BA.D0.B8.D1.85_.D1.8F.D0.B7.D1.8B.D0.BA.D0.BE.D0.B2
http://ru.wikipedia.org/wiki/%D0%9F%D0%BE%D0%B7%D0%B8%D1%86%D0%B8%D0%B8_%D0%B1%D1%83%D0%BA%D0%B2_%D0%BA%D0%B8%D1%80%D0%B8%D0%BB%D0%BB%D0%B8%D1%86%D1%8B_%D0%B2_%D0%B0%D0%BB%D1%84%D0%B0%D0%B2%D0%B8%D1%82%D0%B0%D1%85
http://en.wikipedia.org/wiki/List_of_Cyrillic_letters - per language tables
http://en.wikipedia.org/wiki/Cyrillic_alphabets#Summary_table
http://en.wiktionary.org/wiki/Appendix:Cyrillic_script
Extra chars (see also the ordering table on page 8)
http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3194.pdf
IPA
http://upload.wikimedia.org/wikipedia/commons/f/f5/IPA_chart_2005_png.svg
http://en.wikipedia.org/wiki/Obsolete_and_nonstandard_symbols_in_the_International_Phonetic_Alphabet
http://en.wikipedia.org/wiki/Case_variants_of_IPA_letters
Table with Unicode points marked:
http://www.staff.uni-marburg.de/~luedersb/IPA_CHART2005-UNICODE.pdf
(except for "Lateral flap" and "Epiglottal" column/row.
(Extended) IPA explained by consortium:
http://unicode.org/charts/PDF/U0250.pdf
IPA keyboard
http://www.rejc2.co.uk/ipakeyboard/
http://en.wikipedia.org/wiki/International_Phonetic_Alphabet_chart_for_English_dialects#cite_ref-r_11-0
Is this discussing KBDNLS_TYPE_TOGGLE on VK_KANA???
http://mychro.mydns.jp/~mychro/mt/2010/05/vk-f.html
Windows: fonts substitution/fallback/replacement
http://msdn.microsoft.com/en-us/goglobal/bb688134
Problems on Windows:
http://en.wikipedia.org/wiki/Help:Special_characters#Alt_keycodes_for_Windows_computers
http://en.wikipedia.org/wiki/Template_talk:Unicode#Plane_One_fonts
Console font: Lucida Console 14 is viewable, but has practically no Unicode support.
Consolas (good at 16) has much better Unicode support (sometimes better sometimes worse than DejaVue)
Dejavue is good at 14 (equal to a GUI font size 9 on 15in 1300px screen; 16px unifont is native at 12 here)
http://cristianadam.blogspot.com/2009/11/windows-console-and-true-type-fonts.html
Apparently, Windows picks up the flavor (Bold/Italic/Etc) of DejaVue at random; see
http://jpsoft.com/forums/threads/strange-results-with-cp-1252.1129/
- he got it in bold. I''m getting it in italic... Workaround: uninstall
all flavors but one (the BOOK flavor), THEN enable it for the console... Then reinstall
(preferably newer versions).
Display (how WikiPedia does it):
http://en.wikipedia.org/wiki/Help:Special_characters#Displaying_special_characters
http://en.wikipedia.org/wiki/Template:Unicode
http://en.wikipedia.org/wiki/Template:Unichar
http://en.wikipedia.org/wiki/User:Ruud_Koot/Unicode_typefaces
In CSS: .IPA, .Unicode { font-family: "Arial Unicode MS", "Lucida Sans Unicode"; }
http://web.archive.org/web/20060913000000/http://en.wikipedia.org/wiki/Template:Unicode_fonts
Inspect which font is used by Firefox:
https://addons.mozilla.org/en-US/firefox/addon/fontinfo/
Windows shortcuts:
http://windows.microsoft.com/en-US/windows7/Keyboard-shortcuts
http://www.redgage.com/blogs/pankajugale/all-keyboard-shortcuts--very-useful.html
On meaning of Unicode math codepoints
http://milde.users.sourceforge.net/LUCR/Math/unimathsymbols.pdf
http://milde.users.sourceforge.net/LUCR/Math/data/unimathsymbols.txt
http://unicode.org/Public/math/revision-09/MathClass-9.txt
Monospaced fonts with combining marks (!)
https://bugs.freedesktop.org/show_bug.cgi?id=18614
https://bugs.freedesktop.org/show_bug.cgi?id=26941
Indic ISCII - any hope with it? (This is not representable...:)
http://unicode.org/mail-arch/unicode-ml/y2012-m09/0053.html
(Persieved) problems of Unicode (2001)
http://www.ibm.com/developerworks/library/u-secret.html
On a need to have input methods for unicode
http://unicode.org/mail-arch/unicode-ml/y2012-m07/0226.html
On info on Unicode chars
http://unicode.org/mail-arch/unicode-ml/y2012-m07/0415.html
Zapf dingbats encoding, and other fine points of AdobeGL:
ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/ADOBE/zdingbat.txt
http://web.archive.org/web/20001015040951/http://partners.adobe.com/asn/developer/typeforum/unicodegn.html
Yet another (IMO, silly) way to handle '; fight: ' vs ` ´
http://www.cl.cam.ac.uk/~mgk25/ucs/apostrophe.html
Quoting tchrist: You can snag unichars
, uniprops
, and uninames
from http://training.perl.com if you like.
Tom's unicode scripts
http://search.cpan.org/~bdfoy/Unicode-Tussle-1.03/lib/Unicode/Tussle.pm
.XCompose: on docs and examples
Syntax of .XCompose
is (partially) documented in
http://www.x.org/archive/current/doc/man/man5/Compose.5.xhtml
http://cgit.freedesktop.org/xorg/lib/libX11/tree/man/Compose.man
# Modifiers are not documented
# (Shift, Alt, Lock, Ctrl with aliases Meta, Caps; apparently,
# ! is applied to a sequence without ~ ???)
Semantic (e.g., which of keybindings has a preference) is not documented. Experiments (see below) show that a longer binding wins; if same length, one which is loaded later wins. Relation with presence of modifiers is not clear.
# (the source of imLcPrs.c shows that the explansion of the
# shorter sequence is stored too - but the presence of
# ->succession means that the code to process the resulting
# tree ignores the expansion).
Before the syntax was documented: For the best approximation, read the parser's code, e.g., google for
inurl:compose.c XCompose
site:cgit.freedesktop.org "XCompose"
site:cgit.freedesktop.org "XCompose" filetype:c
_XimParseStringFile
http://cgit.freedesktop.org/xorg/lib/libX11/tree/modules/im/ximcp/imLcIm.c
http://cgit.freedesktop.org/xorg/lib/libX11/tree/modules/im/ximcp/imLcPrs.c
http://uim.googlecode.com/svn-history/r6111/trunk/gtk/compose.c
http://uim.googlecode.com/svn/tags/uim-1.5.2/gtk/compose.c
The actual use of the compiled compose table:
http://cgit.freedesktop.org/xorg/lib/libX11/tree/modules/im/ximcp/imLcFlt.c
Apparently, the first node (= defined last) in the tree which matches keysym and modifiers is chosen. So to override <Foo> <Bar>
, looks like (checked to work!) ~Ctrl <Foo>
may be used... On the other hand, defining both <Foo> <Bar> <Baz>
and (later) <Foo> ~Ctrl <Bar>
, one would expect that <Foo> <Ctrl-Bar> <Baz>
should still trigger the expansion of <Foo> <Bar> <Baz>
— but it does not... See also:
http://cgit.freedesktop.org/xorg/lib/libX11/tree/modules/im/ximcp/imLcLkup.c
The file .XCompose is processed by X11 clients on startup. The changes to this file should be seen immediately by all newly started clients (but GTK or QT applications may need extra config - see below) unless the directory ~/.compose-cache is present and has a cache file compatible with binary architecture (then until cache expires - one day after creation - changes are not seen). The name .XCompose may be overriden by environment variable XCOMPOSEFILE
.
To get (better?) examples, google for "multi_key" partial alpha "DOUBLE-STRUCK"
.
# include these first, so they may be overriden later
include "%H/my-Compose/.XCompose-kragen"
include "%H/my-Compose/.XCompose-ootync"
include "%H/my-Compose/.XCompose-pSub"
Check success: kragen: \ space
--> ␣; ootync: o F
--> ℉; pSub: 0 0
--> ∞ ...
Older versions of X11 do not understand %L %S. - but understand %H
E.g. Debian Squeeze 6.0.6; according to
http://packages.debian.org/search?keywords=x11-common
it has v 1:7.5+8+squeeze1
).
include "/etc/X11/locale/en_US.UTF-8/Compose"
include "/usr/share/X11/locale/en_US.UTF-8/Compose"
Import default rules from the system Compose file: usually as above (but supported only on newer systems):
include "%L"
detect the success of the lines above: get #
by doing Compose + +
...
The next file to include have been generated by
perl -wlne 'next if /#\s+CIRCLED/; print if />\s+<.*>\s+<.*>\s+<.*/' /usr/share/X11/locale/en_US.UTF-8/Compose
### Std tables contain quadruple prefix for GREEK VOWELS and CIRCLED stuff
### only. But there is a lot of triple prefix...
perl -wne 'next if /#\s+CIRCLED/; $s{$1}++ or print qq( $1) if />\s+<.*>\s+<.*>\s+<.*"(.*)"/' /usr/share/X11/locale/en_US.UTF-8/Compose
## – — ☭ ª º Ǖ ǖ Ǘ ǘ Ǚ ǚ Ǜ ǜ Ǟ ǟ Ǡ ǡ Ǭ ǭ Ǻ ǻ Ǿ ǿ Ȫ ȫ Ȭ ȭ Ȱ ȱ ʰ ʱ ʲ ʳ ʴ ʵ ʶ ʷ ʸ ˠ ˡ ˢ ˣ ˤ ΐ ΰ Ḉ ḉ Ḕ ḕ Ḗ ḗ Ḝ ḝ Ḯ ḯ Ḹ ḹ Ṍ ṍ Ṏ ṏ Ṑ ṑ Ṓ ṓ Ṝ ṝ Ṥ ṥ Ṧ ṧ Ṩ ṩ Ṹ ṹ Ṻ ṻ Ấ ấ Ầ ầ Ẩ ẩ Ẫ ẫ Ậ ậ Ắ ắ Ằ ằ Ẳ ẳ Ẵ ẵ Ặ ặ Ế ế Ề ề Ể ể Ễ ễ Ệ ệ Ố ố Ồ ồ Ổ ổ Ỗ ỗ Ộ ộ Ớ ớ Ờ ờ Ở ở Ỡ ỡ Ợ ợ Ứ ứ Ừ ừ Ử ử Ữ ữ Ự ự ἂ ἃ ἄ ἅ ἆ ἇ Ἂ Ἃ Ἄ Ἅ Ἆ Ἇ ἒ ἓ ἔ ἕ Ἒ Ἓ Ἔ Ἕ ἢ ἣ ἤ ἥ ἦ ἧ Ἢ Ἣ Ἤ Ἥ Ἦ Ἧ ἲ ἳ ἴ ἵ ἶ ἷ Ἲ Ἳ Ἴ Ἵ Ἶ Ἷ ὂ ὃ ὄ ὅ Ὂ Ὃ Ὄ Ὅ ὒ ὓ ὔ ὕ ὖ ὗ Ὓ Ὕ Ὗ ὢ ὣ ὤ ὥ ὦ ὧ Ὢ Ὣ Ὤ Ὥ Ὦ Ὧ ᾀ ᾁ ᾂ ᾃ ᾄ ᾅ ᾆ ᾇ ᾈ ᾉ ᾊ ᾋ ᾌ ᾍ ᾎ ᾏ ᾐ ᾑ ᾒ ᾓ ᾔ ᾕ ᾖ ᾗ ᾘ ᾙ ᾚ ᾛ ᾜ ᾝ ᾞ ᾟ ᾠ ᾡ ᾢ ᾣ ᾤ ᾥ ᾦ ᾧ ᾨ ᾩ ᾪ ᾫ ᾬ ᾭ ᾮ ᾯ ᾲ ᾴ ᾷ ῂ ῄ ῇ ῒ ῗ ῢ ῧ ῲ ῴ ῷ ⁱ ⁿ ℠ ™ שּׁ שּׂ а̏ А̏ е̏ Е̏ и̏ И̏ о̏ О̏ у̏ У̏ р̏ Р̏ 🙌
The folloing exerpt from NEO compose tables may be good if you use keyboards which do not generate dead keys, but may generate Cyrillic keys; in other situations, edit filtering/naming on the following download command and on the include
line below. (For my taste, most bindings are useless since they contain keysymbols which may be generated with NEO, but not with less intimidating keylayouts.)
(Filtering may be important, since having a large file may significantly slow down client's startup (without ~/.compose-cache???).)
# perl -wle 'foreach (qw(base cyrillic greek lang math)) {my @i=@ARGV; $i[-1] .= qq($_.module?format=txt); system @i}' wget -O - http://wiki.neo-layout.org/browser/Compose/src/ | perl -wlne 'print unless /<(U[\dA-F]{4,6}>|dead_|Greek_)/' > .XCompose-neo-no-Udigits-no-dead-no-Greek
include "%H/.XCompose-neo-no-Udigits-no-dead-no-Greek"
# detect the success of the line above: get ♫ by doing Compose Compose (but this binding is overwritten later!)
###################################### Neo's Math contains junk at line 312
Print with something like (loading in a web browser after this):
perl -l examples/filter-XCompose ~/.XCompose-neo-no-Udigits-no-dead-no-Greek > ! o-neo
env LC_ALL=C sort -f o-neo | column -x -c 130 > ! /tmp/oo-neo-x
“Systematic” parts of rules in a few .XCompose
================== .XCompose b=bepo o=ootync k=kragen p=pSub s=std
b Double-Struck b
o circled ops b
O big circled ops b
r rotated b 8ACETUv ∞
- sub p
= double arrows po
g greek po
m math p |=Double-Struck rest haphasard...
O circles p Oo
S stars p Ss
^ sup p added: i -
| daggers p
Double mathop ok +*&|%8CNPQRZ AE
# thick-black arrows o
-,Num- arrows o
N/N fractions o
hH pointing hands o
O circled ops o
o degree o
rR roman nums o
\ UP upper modifiers o
\ DN lower modifiers o
{ set theoretic o
| arrows |-->flavors o
UP / roots o
LFT DN 6-quotes, bold delim o
RT DN 9-quotes, bold delim o
UP,DN super,sub o
DOUBLE-separated-by-& op k ( )
in-() circled k xx for tensor
in-[] boxed, dice, play-cards k
BKSP after revert k
< after revert k
` after small-caps k
' after hook k
, after hook below k
h after phonetic k
# musical k
%0 ROMAN k %_0 for two-digit
% roman k %_ for two-digit
* stars k
*. var-greek k
* greek k
++, 3 triple k
+ double k
, quotes k
!, / negate k
6,9 6,9-quotes k
N N fractions k
= double-arrows, RET k
CMP x2 long names k
f hand, pencils k
\ combining??? k
^ super, up modifier k
_ low modifiers k
|B, |W chess, checkers, B&W k
| double-struck k
ARROWS ARROWS k
! dot below s
" diaeresis s
' acute s
trail < left delimiter s
trail > right delimiter s
trail \ slopped variant s
( ... ) circled s
( greek aspirations s
) greek aspirations s
+ horn s
, cedilla s
. dot above s
- hor. bar s
/ diag, vert hor. bar s
; ogonek s
= double hor.bar s
trail = double hor.bar s
? hook above s
b breve s
c check above s
iota iota below s
trail 0338 negated s
o ring above s
U breve s
SOME HEBREW
^ circumblex s
^ _ superscript s
^ undbr superscript s
_ bar s
_ subscript s
underbr subscript s
` grave s
~ greek dieresis s
~ tilde s
overbar bar s
´ acute s ´ is not '
¸ cedilla s ¸ is cedilla
LIMITATIONS
Currently only output for Windows keyboard layout drivers (via MSKLC) is available.
Currently only the keyboards with US-mapping of hardware keys to "the etched symbols" are supported (think of German physical keyboards where Y/Z keycaps are swapped: Z is etched between T and U, and Y is to the left of X, or French which swaps A and Q, or French or Russian physical keyboards which have more alphabetical keys than 26).
Currently no LIGATURES
are supported.
While the architecture of assembling a keyboard of small easy-to-describe pieces is (IMO) elegant and very powerful, and is proven to be useful, it still looks like a collection of independent hacks. Many of these hacks look quite similar; it would be great to find a way to unify them, so reduce the repertoir of operations for assembly.
The current documentation is a hodge-podge of semi-coherent rambling.
The implementation of the module is crumbling under its weight. Its evolution was by bloating (even when some design features were simplified). Since initially I had very little clue to which level of abstraction and flexibility the keyboard description would evolve, bloating accumulated to incredible amounts.
UNICODE TABLE GOTCHAS
APL symbols with UP TACK
and DOWN TACK
look reverted w.r.t. other UP TACK
and DOWN TACK
symbols. (We base our mutation on the names, not glyphs.)
LESS-THAN
, FULL MOON
, GREATER-THAN
, EQUALS
GREEK RHO
, MALE
are defined with SYMBOL
or SIGN
at end, but (may) drop it when combined with modifiers via WITH
. Likewise for SUBSET OF
, SUPERSET OF
, CONTAINS AS MEMBER
, PARALLEL TO
, EQUIVALENT TO
, IDENTICAL TO
.
Sometimes opposite happens, and SIGN
appears out of blue sky; compare:
2A18 INTEGRAL WITH TIMES SIGN
2A19 INTEGRAL WITH INTERSECTION
ENG
is a combination of n
with HOOK
, but it is not marked as such in its name.
Sometimes a name of diacritic (after WITH
) acquires an ACCENT
at end (see U+0476
).
Oftentimes the part to the left of WITH
is not resolvable: sometimes it is underspecified (e.g, just TRIANGLE
), sometimes it is overspecified (e.g., in LEFT VERTICAL BAR WITH QUILL
), sometime it should be understood as a word (e.g, in END WITH LEFTWARDS ARROW ABOVE
). Sometimes it just does not exist (e.g., LATIN LETTER REVERSED GLOTTAL STOP WITH STROKE
- there is LATIN LETTER INVERTED GLOTTAL STOP
, but not the reversed variant). Sometimes it is a defined synonym (VERTICAL BAR
).
Sometimes it has something appended (N-ARY UNION OPERATOR WITH DOT
).
Sometimes WITH
is just a clarification (RIGHTWARDS HARPOON WITH BARB DOWNWARDS
).
1 AND
1 ANTENNA
1 ARABIC MATHEMATICAL OPERATOR HAH
1 ARABIC MATHEMATICAL OPERATOR MEEM
1 ARABIC ROUNDED HIGH STOP
1 ARABIC SMALL HIGH LIGATURE ALEF
1 ARABIC SMALL HIGH LIGATURE QAF
1 ARABIC SMALL HIGH LIGATURE SAD
1 BACK
1 BLACK SUN
1 BRIDE
1 BROKEN CIRCLE
1 CIRCLED HORIZONTAL BAR
1 CIRCLED MULTIPLICATION SIGN
1 CLOSED INTERSECTION
1 CLOSED LOCK
1 COMBINING LEFTWARDS HARPOON
1 COMBINING RIGHTWARDS HARPOON
1 CONGRUENT
1 COUPLE
1 DIAMOND SHAPE
1 END
1 EQUIVALENT
1 FISH CAKE
1 FROWNING FACE
1 GLOBE
1 GRINNING CAT FACE
1 HEAVY OVAL
1 HELMET
1 HORIZONTAL MALE
1 IDENTICAL
1 INFINITY NEGATED
1 INTEGRAL AVERAGE
1 INTERSECTION BESIDE AND JOINED
1 KISSING CAT FACE
1 LATIN CAPITAL LETTER REVERSED C
1 LATIN CAPITAL LETTER SMALL Q
1 LATIN LETTER REVERSED GLOTTAL STOP
1 LATIN LETTER TWO
1 LATIN SMALL CAPITAL LETTER I
1 LATIN SMALL CAPITAL LETTER U
1 LATIN SMALL LETTER LAMBDA
1 LATIN SMALL LETTER REVERSED R
1 LATIN SMALL LETTER TC DIGRAPH
1 LATIN SMALL LETTER TH
1 LEFT VERTICAL BAR
1 LOWER RIGHT CORNER
1 MEASURED RIGHT ANGLE
1 MONEY
1 MUSICAL SYMBOL
1 NIGHT
1 NOTCHED LEFT SEMICIRCLE
1 ON
1 OR
1 PAGE
1 RIGHT ANGLE VARIANT
1 RIGHT DOUBLE ARROW
1 RIGHT VERTICAL BAR
1 RUNNING SHIRT
1 SEMIDIRECT PRODUCT
1 SIX POINTED STAR
1 SMALL VEE
1 SOON
1 SQUARED UP
1 SUMMATION
1 SUPERSET BESIDE AND JOINED BY DASH
1 TOP
1 TOP ARC CLOCKWISE ARROW
1 TRIPLE VERTICAL BAR
1 UNION BESIDE AND JOINED
1 UPPER LEFT CORNER
1 VERTICAL BAR
1 VERTICAL MALE
1 WHITE SUN
2 CLOSED MAILBOX
2 CLOSED UNION
2 DENTISTRY SYMBOL LIGHT VERTICAL
2 DOWN-POINTING TRIANGLE
2 HEART
2 LEFT ARROW
2 LINE INTEGRATION
2 N-ARY UNION OPERATOR
2 OPEN MAILBOX
2 PARALLEL
2 RIGHT ARROW
2 SMALL CONTAINS
2 SMILING CAT FACE
2 TIMES
2 TRIPLE HORIZONTAL BAR
2 UP-POINTING TRIANGLE
2 VERTICAL KANA REPEAT
3 CHART
3 CONTAINS
3 TRIANGLE
4 BANKNOTE
4 DIAMOND
4 PERSON
5 LEFTWARDS TWO-HEADED ARROW
5 RIGHTWARDS TWO-HEADED ARROW
8 DOWNWARDS HARPOON
8 UPWARDS HARPOON
9 SMILING FACE
11 CIRCLE
11 FACE
11 LEFTWARDS HARPOON
11 RIGHTWARDS HARPOON
15 SQUARE
perl -wlane "next unless /^Unresolved: <(.*?)>/; $s{$1}++; END{print qq($s{$_}\t$_) for keys %s}" oxx-us2 | sort -n > oxx-us2-sorted-kw
SQUARE WITH
specify fill - not combining. FACE
is not combining, same for HARPOON
s.
Only CIRCLE WITH HORIZONTAL BAR
is combining. Triangle is combining only with underbar and dot above.
TRIANGLE
means WHITE UP-POINTING TRIANGLE
. DIAMOND
- WHITE DIAMOND
(so do many others.) TIMES
means MULTIPLICATION SIGN
; but CIRCLED MULTIPLICATION SIGN
means CIRCLED TIMES
- go figure! CIRCLED HORIZONTAL BAR WITH NOTCH
is not a decomposition (it is "something circled").
Another way of compositing is OVER
(but not UNDER
!) and FROM BAR
. See also ABOVE
, BELOW
- but only BELOW LONG DASH
. Avoid WITH/AND
after these.
TWO HEADED
should replace TWO-HEADED
. LEFT ARROW
means LEFTWARDS ARROW
, same for RIGHT
. DIAMOND SHAPE
means DIAMOND
- actually just a bug - http://www.reddit.com/r/programming/comments/fv8ao/unicode_600_standard_published/? LINE INTEGRATION
means CONTOUR INTEGRAL
. INTEGRAL AVERAGE
means INTEGRAL
. SUMMATION
means N-ARY SUMMATION
. INFINITY NEGATED
means INFINITY
.
HEART
means WHITE HEART SUIT
. TRIPLE HORIZONTAL BAR
looks genuinely missing...
SEMIDIRECT PRODUCT
means one of two, left or right???
This better be convertible by rounding/sharpening, but see BUT NOT/WITH NOT/OR NOT/AND SINGLE LINE NOT/ABOVE SINGLE LINE NOT/ABOVE NOT
2268 LESS-THAN BUT NOT EQUAL TO; 1.1
2269 GREATER-THAN BUT NOT EQUAL TO; 1.1
228A SUBSET OF WITH NOT EQUAL TO; 1.1
228B SUPERSET OF WITH NOT EQUAL TO; 1.1
@ Relations
22E4 SQUARE IMAGE OF OR NOT EQUAL TO; 1.1
22E5 SQUARE ORIGINAL OF OR NOT EQUAL TO; 1.1
@@ 2A00 Supplemental Mathematical Operators 2AFF
@ Relational operators
2A87 LESS-THAN AND SINGLE-LINE NOT EQUAL TO; 3.2
x (less-than but not equal to - 2268)
2A88 GREATER-THAN AND SINGLE-LINE NOT EQUAL TO; 3.2
x (greater-than but not equal to - 2269)
2AB1 PRECEDES ABOVE SINGLE-LINE NOT EQUAL TO; 3.2
2AB2 SUCCEEDS ABOVE SINGLE-LINE NOT EQUAL TO; 3.2
2AB5 PRECEDES ABOVE NOT EQUAL TO; 3.2
2AB6 SUCCEEDS ABOVE NOT EQUAL TO; 3.2
@ Subset and superset relations
2ACB SUBSET OF ABOVE NOT EQUAL TO; 3.2
2ACC SUPERSET OF ABOVE NOT EQUAL TO; 3.2
Looking into v6.1 reference PDFs, 2268,2269,2ab5,2ab6,2acb,2acc have two horizontal bars, 228A,228B,22e4,22e5,2a87,2a88,2ab1,2ab2 have one horizontal bar, Hence BUT NOT EQUAL TO
and ABOVE NOT EQUAL TO
are equivalent; so are WITH NOT EQUAL TO
, OR NOT EQUAL TO
, AND SINGLE-LINE NOT EQUAL TO
and ABOVE SINGLE-LINE NOT EQUAL TO
. (Square variants come only with one horizontal line?)
Set $ENV{UI_KEYBOARDLAYOUT_UNRESOLVED}
to enable warnings. Then do
perl -wlane "next unless /^Unresolved: <(.*?)>/; $s{$1}++; END{print qq($s{$_}\t$_) for keys %s}" oxx | sort -n > oxx-sorted-kw
COPYRIGHT
Copyright (c) 2011-2012 Ilya Zakharevich <ilyaz@cpan.org>
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.0 or, at your option, any later version of Perl 5 you may have available.
The distributed examples may have their own copyrights.
TODO
UniPolyK-MultiSymple
Multiple linked faces (accessible as described in ChangeLog); designated Primary- and Secondary- switch keys (as Shift-Space and AltGr-Space now).
Soft hyphen
as a deadkey may be not a good idea: following it by a special key (such as Shift-Tab
, or Control-Enter
) may insert the deadkey character??? Hence the character should be highly visible... (Now the key is invisible, so this is irrelevant...)
Currently linked layers must have exactly the same number of keys in VK-tables.
VK tables for TAB, BACK were BS. Same (remains) for the rest of unusual keys... (See TAB-was.) But UTOOL cannot handle them anyway...
Define an extra element in VK keys: linkable. Should be sorted first in the kbd map, and there should be the same number in linked lists. Non-linkable keys should not be linked together by deadkey access...
Interaction of FromToFlipShift with SelectRX not intuitive. This works: Diacritic[<sub>](SelectRX[[0-9]](FlipShift(Latin)))
DefinedTo cannot be put on Cyrillic 3a9 (yo to superscript disappears - due to duplication???).
... so we do it differently now, but: LinkLayer was not aggressively resolving all the occurences of a character on a layer before we started to combine it with Diacritic_if_undef... - and Cyrillic 3a9 is not helped...
via_parent() is broken - cannot replace for Diacritic_if_undef.
Currently, we map ephigraphic letters to capital letters - is it intuitive???
dotted circle ◌ 25CC
DeadKey_Map200A= FlipLayers #DeadKey_Map200A_0= Id(Russian-AltGr) #DeadKey_Map200A_1= Id(Russian) performs differently from the commented variant: it adds links to auto-filled keys...
Why ¨ on THIN SPACE inserts OGONEK after making ¨ multifaceted???
When splitting a name on OVER/BELOW/ABOVE, we need both sides as modifiers???
Ỳ currently unreachable (appears only in Latin-8 Celtic, is not on Wikipedia)
Somebody is putting an extra element at the end of arrays for layers??? - Probably SPACE...
Need to treat upside-down as a pseudo-decomposition.
We decompose reversed-smallcaps in one step - probably better add yet another two-steps variant...
When creating a <pseudo-stuff> treat SYMBOL/SIGN/FINAL FORM/ISOLATED FORM/INITIAL FORM/MEDIAL FORM; note that SIGN may be stripped: LESS-THAN SIGN becomes LESS-THAN WITH DOT
We do not do canonical-merging of diacritics; so one needs to specify VARIA in addition to GRAVE ACCENT.
We use a smartish algorithm to assign multiple diacritics to the same deadkey. A REALLY smart algorithm would use information about when a particular precombined form was introduced in Unicode...
Inspector tool for NamesList.txt:
grep " WITH .* " ! | grep -E -v "(ACUTE|GRAVE|ABOVE|BELOW|TILDE|DIAERESIS|DOT|HOOK|LEG|MACRON|BREVE|CARON|STROKE|TAIL|TONOS|BAR|DOTS|ACCENT|HALF RING|VARIA|OXIA|PERISPOMENI|YPOGEGRAMMENI|PROSGEGRAMMENI|OVERLAY|(TIP|BARB|CORNER) ([A-Z]+WARDS|UP|DOWN|RIGHT|LEFT))$" | grep -E -v "((ISOLATED|MEDIAL|FINAL|INITIAL) FORM|SIGN|SYMBOL)$" |less
grep " WITH " ! | grep -E -v "(ACUTE|GRAVE|ABOVE|BELOW|TILDE|DIAERESIS|CIRCUMFLEX|CEDILLA|OGONEK|DOT|HOOK|LEG|MACRON|BREVE|CARON|STROKE|TAIL|TONOS|BAR|CURL|BELT|HORN|DOTS|LOOP|ACCENT|RING|TICK|HALF RING|COMMA|FLOURISH|TITLO|UPTURN|DESCENDER|VRACHY|QUILL|BASE|ARC|CHECK|STRIKETHROUGH|NOTCH|CIRCLE|VARIA|OXIA|PSILI|DASIA|DIALYTIKA|PERISPOMENI|YPOGEGRAMMENI|PROSGEGRAMMENI|OVERLAY|(TIP|BARB|CORNER) ([A-Z]+WARDS|UP|DOWN|RIGHT|LEFT))$" | grep -E -v "((ISOLATED|MEDIAL|FINAL|INITIAL) FORM|SIGN|SYMBOL)$" |less
AltGrMap should be made CapsLock aware (impossible: smart capslock works only on the first layer, so the dead char must be on the first layer). [May work for Shift-Space - but it has a bag of problems...]
Alas, CapsLock'ing a composition cannot be made stepwise. Hence one must calculate it directly. (Oups, Windows CapsLock is not configurable on AltGr-layer. One may need to convert it to VK_KANA???)
WarnConflicts[exceptions] and NoConflicts translation map parsing rules.
Need a way to map to a different face, not a different layer.
Vietnamese: to put second accent over ă, ơ (o/horn), put them over ae/oe; - including another ˘ which would "cancel the implied one", so will get o-horn itself. - Except for acute accent which should replaced by ¨, and hook must be replaced by ˆ. (Over ae/oe there is only macron and diaeresis over ae.)
Or: for the purpose of taking a second accent, AltGr-A behaves as Ă (or Â?), AltGr-O behaves as Ô (or O-horn Ơ?). Then Å and O/ behave as the other one... And ˚ puts the dot *below*, macron puts a hook. Exception: ¨ acts as ´ on the unaltered AE.
While Å takes acute accent, one can always input it via putting ˚ on Á.
If Ê is on the keyboard (and macron puts a hook), then the only problem is how to enter a hook alone (double circumflex is not precombined), dot below (???), and accents on u-horn ư.
Mogrification rules for double accents: AE Å OE O/ Ù mogrify into hatted/horned versions; macron mogrifies into a hook; second hat modifies a hat into a horn. The only problem: one won't be able to enter double grave on U - use the OTHER combination of ¨ and `... And how to enter dot below on non-accented aue? Put ¨ on umlaut? What about Ë?
To allow . or , on VK_DECIMAL: maybe make CapsLock-dependent?
http://blogs.msdn.com/b/michkap/archive/2006/09/13/752377.aspx
How to write this diacritic recipe: insert hacheck on AltGr-variant, but only if the breve on the base layer variant does not insert hacheck (so inserts breve)???
Sorting diacritics by usefulness: we want to apply one of accents from the given list to a given key (with l layers of 2 shift states). For each accent, we have 2l possible variants for composition; assign to 2 variants differing by Shift the minimum penalty of the two. For each layer we get several possible combinations of different priority; and for each layer, we have a certain number of slots open. We can redistribute combinations from the primary layer to secondary one, but not between secondary layers.
Work with slots one-by-one (so that the assignent is "monotinic" when the number of slots increases). Let m be the number of layers where slots are present. Take highest priority combinations; if the number of "extra" combinations in the primary layer is at least m, distribute the first m of them to secondary layers. If n<m of them are present, fill k layers which have no their own combinations first, then other n-k layers. More precisely, if n<=k, use the first n of "free" layers; if n>k, fill all free layers, then the last n-k of non-free layers.
Repeat as needed (on each step, at most one slot in each layer appears).
But we do not need to separate case-differing keys! How to fix?
All done, but this works only on the current face! To fix, need to pass to the translator all the face-characters present on the given key simultaneously.
===== Accent-key TAB accesses extra bindinges (including NUM->numbered one) (may be problematic with some applications??? -- so duplicate it on + and @ if they is not occupied -- there is nothing related to AT in Unicode)
Diacritics_0218_0b56_0c34= May create such a thing... (0b56_0c34 invisible to the user).
Hmm - how to combine penaltized keys with reversion? It looks like
the higher priority bindings would occupy the hottest slots in both
direct and reverse bindings...
Maybe additional forms Diacrtitics2S_* and Diacrtitics2E_* which fight
for symbols of the same penalty from start and from end (with S winning
on stuff exactly in the middle...). (The E-form would also strip the last |-group.)
' Shift-Space (from US face) should access the second level of Russian face. To avoid infinite cycles, face-switch keys to non-private faces should be marked in each face...
"Acute makes sharper" is applicable to () too to get <>-parens...
Another ways of combining: "OR EQUAL TO", "OR EQUIVALENT TO", "APL FUNCTIONAL SYMBOL QUAD", "APL FUNCTIONAL SYMBOL *** UNDERBAR", "APL FUNCTIONAL SYMBOL *** DIAERESIS".
When recognizing symbols for GREEK, treat LUNATE (as NOP). Try adding HEBREW LETTER at start as well...
Compare with: 8 basic accents: http://en.wikipedia.org/wiki/African_reference_alphabet (English 78)
When a diacritic on a base letter expands to several variants, use them all (with penalty according to the flags).
Problem: acute on acute makes double acute modifier...
Penalized letter are temporarily completely ignored; need to attach them in the end... - but not 02dd which should be completely ignore...
Report characters available on diacritic chains, but not accessible via such chains. Likewise for characters not accessible at all. Mark certain chains as "Hacks" so that they are not counted in these lists.
Long s and "preceded by" are not handled since the table has its own (useless) compatibility decompositions.
╒╤╕ ╞╪╡ ╘╧╛ ╓╥╖ ╟╫╢ ╙╨╜ ╔╦╗ ╠╬╣ ╚╩╝ ┌┬┐ ├┼┤ └┴┘ ┎┰┒ ┠╂┨ ┖┸┚ ┍┯┑ ┝┿┥ ┕┷┙ ┏┳┓ ┣╋┫ ┗┻┛ On top of a light-lines grid (3×2, 2×3, 2×2; H, V, V+H): ┲┱ ╊╉ ┺┹ ┢╈┪ ┡╇┩ ╆╅ ╄╇ ╼━╾╺╸╶─╴╌┄┈ ╍┅┉ ╻ ┃ ╹ ╷ │ ╵
╽ ╿ ╎┆┊╏┇┋
╲ ╱ ╳ ╭╮ ╰╯ ◤▲◥ ◀■▶ ◣▼◢ ◜△◝ ◁□▷ ◟▽◞ ◕◓◔ ◐○◑ ◒ ▗▄▖ ▐█▌ ▝▀▘ ▛▀▜ ▌ ▐ ▙▄▟
░▒▓
WINDOWS GOTCHAS
First of all, keyboard layouts on Windows are controlled by DLLs; the only function of these DLLs is to export a table of "actions" to perform. This table is passed to the kernel, and that's it - whatever is not supported by the format of this table cannot be implemented by native layouts. (The DLL performs no "actions" when actual keyboard events arrive.)
Essentially, the logic is like that: there are primary "keypresses", and chained "keypresses" ("prefix keys" [= deadkeys] and keys pressed after them). Primary keypresses are distinguished by which physical key on keyboard is pressed, and which of "modifier keys" are also pressed at this moment (as well as the state of "latched keys" - usually CapsLock
only). This combination determines which Unicode character is generated by the keypress, and whether this character starts a "chained sequence".
On the other hand, the behaviour of chained keys is governed ONLY by Unicode characters they generate: if there are several physical keypresses generating the same Unicode characters, these keypresses are completely interchangeable inside a chained sequence. (The only restriction is that the first keypress should be marked as "prefix key"; for example, there may be two keys producing - so that one is producing a "real dash sign", and another is producing a "prefix" -.)
The table allows: to map ScanCode
s to VK_key
s; to associate a VK_key
to several (numbered) choices of characters to output, and mark some of these choices as prefixes (deadkeys). (These "base" choices may contain up to 4 16-bit characters (with 32-bit characters mapped to 2 16-bit surrogates); but only those with 1 16-bit character may be marked as deadkeys.) For each prefix character (not a prefix key!) one can associate a table mapping input 16-bit "base characters" to output 16-bit characters, and mark some of the output choices as prefix characters.
The numbered choices above are determined by the state of "modifier keys" (such as Shift
, Alt
, Control
), but not directly. First of all, VK_keys
may be associated to a certain combination of 6 "modifier bits" (called "logical" Shift
, Alt
, Control
, Kana
, User1
and User2
, but the logical bits are not required to coincide with names of modifier keys). (Example: bind Right Control
to activate Shift
and Kana
bits.) The 64 possible combinations of modifier bits are mapped to the numbered choices above.
Additionally, one can define two "separate numbered choices" in presence of CapsLock (but the only allowed modifier bit is Shift
). The another way to determine what CapsLock
is doing: one can mark that it flips the "logical Shift
" bit (separately on no-modifiers state, Control-Alt
-only state, and Kana
-only state [?!] - here "only" allow for the Shift
bit to be ON
).
AltGr
key is considered equivalent to Control-Alt
combination (of those are present, or always???), and one cannot bind Alt
and Alt-Shift
combinations. Additionally, binding bare Control
modifier on alphabetical keys (and SPACE
, [
, ]
, \
) may confuse some applications.
NOTE: there is some additional stuff allowed to be done (but only in presence of Far_East_Support installed???). FE-keyboards can define some sticky state (so may define some other "latching" keys in addition to CapsLock
). However, I did not find a clear documentation yet (keyboard106
in the DDK toolkit???).
There is a tool to create/compile the required DLL: kbdutool.exe of MicroSoft Keyboard Layout Creator (with a graphic frontend MSKLC.exe). The tool does not support customization of modifier bits, and has numerous bugs concerning binding keys which usually do not generate characters. The graphic frontend does not support chained prefix keys, adds another batch of bugs, and has arbitrarily limitations: refuses to work if the compiled version of keyboard is already installed; refuses to work if SPACE
is redefined in useful ways.
WORKFLOW: uninstall the keyboard, comment the definition of SPACE
, load in MSKLC and create an install package. Then uncomment the definition of SPACE
, and compile 4 architecture versions using kbdutool, moving the DLLs into suitable directories of the install package. Install the keyboard.
For development cycle, one does not need to rebuild the install package while recompiling.
- Several similar MSKLC created keyboards may confuse the system
-
Apparently, the system may get majorly confused when the
description
of the project gets changed without changing the DLL (=project) name.(Tested only with Win7 and the name in the DESCRIPTIONS section coinciding with the name on the KBD line - both in *.klc file.)
The symptoms: I know how one can get 4 different lists of keyboards:
Click on the keyboard icon in the
Language Bar
- usually shown on the toolbar; positioned to the right of the language code EN/RU etc (keyboard icon is not shown if only one keyboard is associated to the current language).-
Go to the
Input Language
settings (e.g., right-click on the Language bar, Settings, General. -
on this
General
page, pressAdd
button, go to the language in question. -
Check the .klc files for recently installed Input Languages.
-
In MS Keyboard Layout Creator, go to
File/Load Existing Keyboard
list.
It looks like the first 4 get in sync if one deletes all related keyboards, then installs the necessary subset. I do not know how to fix 5 - MSKLC continues to show the old name for this project.
Another symptom: Current language indicator (like
EN
) on the language bar disappears. (Reboot time?)Is it related to
***\Local Settings\MuiCache\***
hive???Possible workaround: manually remove the entry in
HKEY_LOCAL_MACHINE\SYSTEM\ControlSet001\Control\Keyboard Layouts
(the last 4 digits match the codepage in the .klc file). - Too long description (or funny characters in description?)
-
If the name in the
DESCRIPTIONS
section is too long, the name shown in the list2
above may be empty.(Checked only on Win7 and when the name in the DESCRIPTIONS section coincides with the name on the
KBD
line - both in *.klc file.)(Fixed by shortening the name [but see "Several similar MSKLC created keyboards may confuse the system" above!], so maybe it was not the length but some particular character (
+
?) which was confusing the system. (I saw a report on MSKLC bug when description had apostroph character'
.) - MSKLC ruins names of dead key when reading a .klc
-
When reading a .klc file, MS Keyboard Layout Creator may ruin the names of dead keys. Symptom: open the dialogue for a dead key mapping (click the key, check that
Dead key view
has checkmark, click on the...
button near theDead key?
checkbox); then the name (the first entry field) contains some junk. (Looks like a long ASCII stringU+0030 U+0030 U+0061 U+0039
.)
Workaround: if all one needs is to compile a .klc, one can run KBDUTOOL directly.
Workaround: correct ALL these names manually in MSKLC. If the names are the Unicode name for the dead character, just click the
Default
button near the entry field. Do this for ALL the dead keys in all the registers (includingSPACE
!). IfCapsLock
is not made "semantically meaningful", there are 6 views of the keyboard (PLAIN, Ctrl, Ctrl+Shift, Shift, AltGr, AltGr+Shift
) - check them all for grayed out keys (=deadkeys).Check for success:
File/"Save Source File As
, use a temporary name. Inspect near the end of the generated .klc file. If OK, you can go to the Project/Build menu. (Likewise, this way lets you find which deadkey's names need to be fixed.)!!! This is time-consuming !!! Make sure that other things are OK before you do this (by
Project/Validate
,Project/Test
).BTW: It might be that this is cosmetic only. I do not know any bad effect - but I did not try to use any tool with visual feedback on the currently active sub-layout of keyboard.
- Double bug in KBDUTOOL with dead characters above 0x0fff
-
This line in .klc file is treated correctly by MSKLC's builtin keyboard tester:
39 SPACE 0 0020 00a0@ 0020 2009@ 200a@ // , , , , // SPACE, NO-BREAK SPACE, SPACE, THIN SPACE, HAIR SPACE
However, via kbdutool it produces the following two bugs:
static ALLOC_SECTION_LDATA MODIFIERS CharModifiers = { &aVkToBits[0], 7, { // Modification# // Keys Pressed // ============= // ============= 0, // 1, // Shift 2, // Control SHFT_INVALID, // Shift + Control SHFT_INVALID, // Menu SHFT_INVALID, // Shift + Menu 3, // Control + Menu 4 // Shift + Control + Menu } }; ..................................... {VK_SPACE ,0 ,' ' ,WCH_DEAD ,' ' ,WCH_LGTR ,WCH_LGTR }, {0xff ,0 ,WCH_NONE ,0x00a0 ,WCH_NONE ,WCH_NONE ,WCH_NONE }, ..................................... static ALLOC_SECTION_LDATA LIGATURE2 aLigature[] = { {VK_SPACE ,6 ,0x2009 ,0x2009 }, {VK_SPACE ,7 ,0x200a ,0x200a },
Essentially,
2009@ 200a@
produceLIGATURES
(= multiple 16-bit chars) instead of deadkeys. Moreover, these ligatures are put on non-existing "modifications" 6, 7 (the maximal modification defined is 4; so the code uses theShift + Control + Menu
flags instead of "modification number" in the ligatures table. - MSKLC keyboards handle
Ctrl-Shift-letter
differently than US keyboard -
At least in console applications, the US keyboard produces (as the “string value”) the corresponding Control-letter when
Ctrl-Shift-letter
is pressed. MSKLC does not reproduces this behaviour. This may break an application if it was not specifically tested with “complicated” keyboards.The only way to fix this from the “naive” keyboard layout DLL (i.e., the kind that MSKLC generates) which I found is to explicitly include
Ctrl-Shift
as a handled combination, and returnCtrl-letter
on such keypresses. (This is enabled in the generated keyboards generated by this module - not customizable in v0.12.) - Default keyboard of an application
-
Apparently, there is no way to choose a default keyboard for a certain language. The configuration UI allows moving keyboards up and down in the list, but, apparently, this order is not related to which keyboard is selected when an application starts.
- Hex input of unicode is not enabled
-
One needs to explicitly tinker with the registry (see examples/enable-hex-unicode-entry.reg) and then reboot to enable this.
- Standard fonts have some chars exchanged
-
At least in Consolas and Lucida Sans Unicode φ and ϕ are exchanged. Compare with Courier and Times.
- The console font configuration
-
It is controlled by Registry hive
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\WindowsNT\CurrentVersion\Console\TrueTypeFont
The key
0
usually givesLucida Console
, and the key00
givesConsolas
. Adding random numbers does not work; however, if one adds one more zero (at least when adding to a sequence of zeros), one can add more fonts. You need to export this hive (e.g., usereg export "HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Console\TrueTypeFont" console-ttf.reg
), save a copy (so you can always restore if the love goes sour) then edit the resulting file.
So if the maximal key with 0s is
00
, add one extra row with an extra 0 at end, and the name of your font. To find the name, look into the hiveHKLM\Software\Microsoft\WindowsNT\CurrentVersion\Fonts
For example, after installing
DejaVuSansMono.ttf
, I seeDejaVu Sans Mono (TrueType)
as a key in this hive. So I add a line"000"="DejaVu Sans Mono"
so that the result is now (omitting Far Eastern fonts)
Windows Registry Editor Version 5.00 [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Console\TrueTypeFont] "949"="..." "0"="Lucida Console" "950"="..." "932"="..." "936"="..." "00"="Consolas" "000"="DejaVu Sans Mono"
The full file is in examples/console-fonts00-added.reg. After importing this file via reg (or give it as parameter to regedit; requires administrative priviledges) the font is immediately available in menu. (However, it does not work in "existing" console windows, only in newly created windows.)
(Do not use the example file directly. First inspect the hive exported on your system, and find the number of 0s to use. Then add a new line with correct number of zeros - as a value, one can use the string above. This will preserve the defaults of your setup. Keep in mind that selection-by-fontfamily is buggy: if you have more than one version of the font in different weight, it is a Russian Rullette which one of them will be taken (at least for DejaVu, which uses
Book
as the default weight). First install the "normal" flavor of the font, then do as above (so the system has no way of picking the wrong flavor!), and only after this install the remaining flavors.One more remark: for desktop icons coming from the “Public” user (“shared” icons) which start a console application, the default font is not directly editable. To reset it, one must:
copy the .lnk icon file to “your” desktop directory;
start the application using the “new” icon;
change the font via “Properties” of the window's menu;
as administrator, copy the .lnk file back to the Public/Desktop directory (usually in something like C:/Users). Manually refresh the desktop. Verify that the “old” icon works as expected. (Now you can remove the “new” icon created on the first step.)
AltGr
-keypresses going nowhere-
Some
AltGr
-keypresses do not result in the corresponding letter on keyboard being inserted. It looks like they are stolen by some system-wide hotkeys. See:http://www.kbdedit.com/manual/ex13_replacing_altgr_with_kana.html
If these keypresses would perform some action, one might be able to deduce how to disable the hotkeys. So the real problem comes when the keypress is silently dropped.
I found out one scenario how this might happen, and how to fix this particular situation. (Unfortunately, it is not fixed what I see, when
AltGr-s
[but notAltGr-S
] is stolen. Installing a shortcut, one can associate a hotkey to the shortcut. Unfortunately, the UI allows (and encourages!) hotkeys of the form <Control-Alt-letter> (which are equivalent toAltGr-letter
) - instead of safe combinations likeControl-Alt-F4
orAlt-Shift-letter
(which do not go to keyboard drivers, so cannot generate characters). If/when an application linked to by this shortcut is gone, the hotkey remains, but now it does nothing (no warning or dialogue comes).If the shortcut is installed in one of "standard places", one can find it. Save this to K:\findhotkey.vbs (replace K: by the suitable drive letter here and below)
on error resume next set WshShell = WScript.CreateObject("WScript.Shell") Dim A Dim Ag Set Ag=Wscript.Arguments If Ag.Count > 0 then For x = 0 to Ag.Count -1 A = A & Ag(x) Next End If Set FSO = CreateObject("Scripting.FileSystemObject") f=FSO.GetFile(A) set lnk = WshShell.CreateShortcut(A) If lnk.hotkey <> "" then msgbox A & vbcrlf & lnk.hotkey End If
Save this to K:\findhotkey.cmd
set findhotkey=k:\findhotkey for /r %%A in (*.lnk) do %findhotkey%.vbs "%%A" for /r %%A in (*.pif) do %findhotkey%.vbs "%%A" for /r %%A in (*.url) do %findhotkey%.vbs "%%A" cd /d %UserProfile%\desktop for /r %%A in (*.lnk) do %findhotkey%.vbs "%%A" for /r %%A in (*.pif) do %findhotkey%.vbs "%%A" for /r %%A in (*.url) do %findhotkey%.vbs "%%A" cd /d %AllUsersProfile%\desktop for /r %%A in (*.lnk) do %findhotkey%.vbs "%%A" for /r %%A in (*.pif) do %findhotkey%.vbs "%%A" for /r %%A in (*.url) do %findhotkey%.vbs "%%A" cd /d %UserProfile%\Start Menu for /r %%A in (*.lnk) do %findhotkey%.vbs "%%A" for /r %%A in (*.pif) do %findhotkey%.vbs "%%A" for /r %%A in (*.url) do %findhotkey%.vbs "%%A" cd /d %AllUsersProfile%\Start Menu for /r %%A in (*.lnk) do %findhotkey%.vbs "%%A" for /r %%A in (*.pif) do %findhotkey%.vbs "%%A" for /r %%A in (*.url) do %findhotkey%.vbs "%%A" cd /d %APPDATA% for /r %%A in (*.lnk) do %findhotkey%.vbs "%%A" for /r %%A in (*.pif) do %findhotkey%.vbs "%%A" for /r %%A in (*.url) do %findhotkey%.vbs "%%A" cd /d %HOMEDRIVE%%HOMEPATH% for /r %%A in (*.lnk) do %findhotkey%.vbs "%%A" for /r %%A in (*.pif) do %findhotkey%.vbs "%%A" for /r %%A in (*.url) do %findhotkey%.vbs "%%A"
(In most situations, only the section after the last
cd /d
is important; in my configuration all the "interesting" stuff is in%APPDATA%
. Running this should find all shortcuts which define hot keys.Run the cmd file. Repeat in the "All users"/"Public" directory. It should show a dialogue for every shortcut with a hotkey it finds. (But, as I said, it did not fix my problem:
AltGr-s
works in MSKLC test window, and nowhere else I tried...) - "There was a problem loading the file" from
MSKLC
-
Make line endings in .klc DOSish.
AltGr-keys
do not work-
Make line endings in .klc DOSish (when given as input to kbdutool - it gives no error messages, and deadkeys work [?!]).
On principles of intuitive design of Latin keyboard
Some common (meaning: from Latin-1-10 of ISO 8859) Latin alphabet letters are not composed (at least not by using 3 simplest modifiers out of 8 modifiers). We mean ÆÐÞÇIJØŒß (and ¡¿ for non-alphatetical symbols). It is crucial that they may be entered by an intuitively clear key of the keyboard. There is an obvious ASCII letter associated to each of these (e.g., T associated to the thorn Þ), and in the best world just pressing this letter with AltGr
-modifier would produce the desired symbol.
But what to do with ª,º?
There is only one conflict: both Ø,Œ "want" to be entered as AltGr-O
; this is the ONLY piece of arbitrariness in the design so far. After resolving this conflict, AltGr
-keys !ASDCTIO? are assigned their meanings, and cannot carry other letters (call them "stuck in stone keys").
(Other keys "stuck in stone" are dead keys: it is important to have the glyph etched on these keyboard's keys similar to the task they perform.)
Then there are several non-alphabetical symbols accessible through ISO 8859 encodings. Assigning them AltGr
- access is another task to perform. Some of these symbols come in pairs, such as ≤≥, «», ‹›, “”, ‘’; it makes sense to assign them to paired keyboard's keys: <> or [] or ().
However, this task is in conflict of interests with the following task, so let us explain the needs answered by that task first.
One can always enter accented letters using dead keys; but many people desire a quickier way to access them, by just pressing AltGr-key (possibly with shift). The most primitive keyboard designs (such as IBM International,
http://www.borgendale.com/uls.htm
) omit this step and assign only the NECESSARY letters for AltGr- access. (Others, like MicroSoft International, assign only a very small set.)
This problem breaks into two tasks, choosing a repertoir of letters which will be typable this way, and map them to the keys of the keyboard. For example, EurKey choses to use ´¨`-accented characters AEUIO (except for Ỳ), plus ÅÑ; MicroSoft International does ÄÅÉÚÍÓÖÁÑß
only (and IBM International does none); Bepo does only ÉÈÀÙŸ (but also has the Azeri Ə available - which is not in ISO 8819 - and has Ê on the 105th key "2nd \|
"), Mac Extended has only ÝŸ (?!)
http://bepo.fr/wiki/Manuel
http://bepo.fr/wiki/Utilisateur:Masaru # old version of .klc
http://www.jlg-utilities.com/download/us_jlg.klc
http://tlt.its.psu.edu/suggestions/international/accents/codemacext.html
or look for "a graphic of the special characters" on
http://homepage.mac.com/thgewecke/mlingos9.html
Keyboards on Mac: http://homepage.mac.com/thgewecke/mlingos9.html Tool to produce: http://wordherd.com/keyboards/ http://developer.apple.com/library/mac/#technotes/tn2056/_index.html
Our solution
First, the answer:
- Rule 0:
-
letters which are not accented by `´¨˜ˆˇ°¯ are entered by
AltGr
-keys "obviously associated" to them. Supported: ÆÐÞÇIJØß. - Rule 0a:
-
Same is applicable to Ê and Ñ.
- Rule 1:
-
Vowels AEYUIO accented by `´¨ are assigned the so called "natural position": 3 Bottom row of keyboard are allocated to accents (¨ is the top, ´ is the middle, ` is the bottom row of 3 letter-rows on keyboard - so À is on ZXCV-row), and are on the same diagonal as the base letter. For left-hand vowels (A,E) the diagonal is in the direction of \, for right hand voweles (Y,U,I,O) - in the direction of /.
- Rule 1a:
-
If the "natural position" is occupied, the neighbor key in the direction of "the other diagonal" is chosen. (So for A,E it is the /-diagonal, and for right-hand vowels YUIO it is the \-diag.)
- Rule 1b:
-
The neighbor key is down unless the key is on bottom row - then it is up.
Supported by rules "1": all but ÏËỲ.
- Rule 2:
-
Additionally, Å,Œ,Ì are available on keys R,P,V.
Clarification:
If you remember only Rule 0, you still can enter all Latin-1 letter using Rule 0; all you need to memorize are dead keys: `';~6^7& for `´¨˜ˆˇ°¯ on EurKey keyboard (but better locations ARE possible).
(What the rule 0 actually says is: "You do not need to memorize me". ;-)
If all you remember are rules 1,1a, you can calculate the position of the AltGr-key for AEYUIO accented by `´¨ up to a choice of 3 keys (the "natural key" and its 2 neighbors) - which are quick to try all if you forgot the precise position. If you remember rules 1,1ab, then this choice is down to 2 possible candidates.
Essentially, all you must remember in details is that the "natural positions" form a V-shape # - \ on left, / on right, and in case of bad luck you should move in the direction of other diagonal one step. Then a letter is either in its "obvious position", or in one of 3 modifications of the natural position". Only Å and Œ need a special memorization.
Motivations:
It is important to have a logical way to quickly understand whether a letter is quickly accessible from a keyboard, and on which key (or, maybe, to find a small set of keys on which a letter may be present - then, if one forgets, it is possible to quickly un-forget by trying a small number of keys).
The idea: we assign alphabetical Latin symbols only to alphabetical keys on the keyboard; this way we can use (pared) symbol keys to enter pared Unicode symbols. Now consider diagonals on the alphabetic part of the keyboard: \-diagonals (like EDC) and /-diagonals (like UHB). Each diagonal contains 3 (or less) alphabetic keys; we WANT to assign ¨-accent to the top one, ´-accent to the middle one, and `-accent to the bottom one.
On the left-hand part of the keyboard, use \-diagonals, on the right-hand part use /-diagonals; now each diagonal contains EXACTLY 3 alphabetic keys. Moreover, the diagonals which contain vowels AEYUIO do not intersect.
If we have not decided to have keys set in stone, this would be all - we would get "completely predictable" access to ´¨`-accented characters AEUIO. For example, Ÿ would be accessible on AltGr-Y, Ý on AltGr-G, Ỳ on AltGr-V. Unfortunately, the diagonals contain keys ASDCIO set in stone. So we need a way to "move away" from these keys. The rule is very simple: we move one step away in the direction of "other" diagonal (/-diagonal on the left half, and \-diagonal on the right half) one step down (unless we start on keys A, C where "down" is impossible and we move up to W or F).
Examples: Ä is on Q, Á "wants to be" on A (used for Æ), so it is moved to W; Ö wants to be on O (already used for Ø or Œ), and is moved away to L; È wants to be on C (occupied by Ç), but is moved away to F.
There is no way to enter Ï using this layout (unless we agree to move it to the "8*" key, which may conflict with convenience of entering typographic quotation marks). Fortunately, this letter is rare (comparing even to Ë which is quite frequent in Dutch). So there is no big deal that it is not available for "handy" input - remember that one can always use deadkeys.
http://en.wikipedia.org/wiki/Letter_frequency#Relative_frequencies_of_letters_in_other_languages
Note that the keys "P" and "R" are not engaged by this layout; since "P" is a neighbor of "O", it is natural to use it to resolve the conflict between Ø or Œ (which both want to be set in stone on "O"). This leaves only the key "R" unengaged; but what we do not cover are two keys Å and Ñ which are relatively frequent in Latin-derived European languages.
Note that Ì is moderately frequent in Italian, but Ñ is much more frequent in Spanish. Since Ì occupies the key which on many keyboards is taken by Ñ, maybe it makes sense to switch them... Likewise, Ê is much more frequent than Ë; switch them.
(OLD?) TODO
U-caron: ǔ, Ǔ which is used to indicate u in the third tone of Chinese language pinyin. But U-breve is used in Latin encodings. Ǧ/ǧ (G with caron) is used, but only in "exotic" or old languages (has no combined form - while G-breve is in Latin encodings. A-breve Ă: A-caron Ǎ is not in Latin-N; apparently, is used only in pinyin, zarma, Hokkien, vietnamese, IPA, transliteration of Old Latin, Bible and Cyrillic's big yus.
In EurKey: only a takes breve, the rest take caron (including G but not U)
out of accents ° and dot-accent ˙ in Latin-N: only A and U take °, and they do not take dot-accent. In EurKey: also small w,y take ring accent; same in Bepo - but they do not take dot accent in Latin-N.
Double-´ and cornu (both on a,u only) can be taken by ¨ or ˙ on letters with ¨ already present (in Unicode ¨ is not precombined with diaeresis or dots). But one must special-case Ë and Ï and Ø (have Ê and IJ instead; IJ takes no accents, but Ê takes acute, grave, tilde and dot below...).! Æ takes acute and macron; Ø takes acute.
Actually, cornu=horn is only on o,u, so using dot/ring on ö and ü is very viable...
So for using AltGr-letter after deadkeys: diaresis can take dot above, hat and wedge, diaresis. Likewise, ` and ´ are not precombined together (but there is a combined combining mark). So one can do something else on vowels (ogonek?).
Applying ´ to `-accented forms: we do not have `y, so must use "the natural position" which is mixed with Ñ (takes no accents) and Ç (takes acute!!!).
s, t do not precombine with `; so can use for the "alternative cedilla".
Only auwy take ring, and they do not take cedilla. Can merge.
Bepo's hook above; ảɓƈɗẻểƒɠɦỉƙɱỏƥʠʂɚƭủʋⱳƴỷȥ ẢƁƇƊẺỂƑƓỈƘⱮỎƤƬỦƲⱲƳỶȤ perl -wlnae "next unless /HOOK/; push @F, shift @F; print qq(@F)" NamesList.txt | sort | less Of capital letters only T and Y take different kinds of hooks... (And for T both are in Latin-Extended-B...)
13 POD Errors
The following errors were encountered while parsing the POD:
- Around line 601:
Expected '=item 2'
- Around line 612:
Expected '=item 3'
- Around line 619:
Expected '=item 4'
- Around line 636:
Expected '=item 5'
- Around line 641:
Expected '=item 6'
- Around line 647:
Expected '=item 7'
- Around line 658:
Expected '=item 8'
- Around line 672:
Expected '=item 9'
- Around line 684:
Expected '=item 10'
- Around line 2525:
Expected '=item 2'
- Around line 2530:
Expected '=item 3'
- Around line 2535:
Expected '=item 4'
- Around line 2539:
Expected '=item 5'