The Design and Implementation of the FreeBSD Operating System, Second Edition
Now available: The Design and Implementation of the FreeBSD Operating System (Second Edition)


[ source navigation ] [ diff markup ] [ identifier search ] [ freetext search ] [ file search ] [ list types ] [ track identifier ]

FreeBSD/Linux Kernel Cross Reference
sys/Documentation/unicode.txt

Version: -  FREEBSD  -  FREEBSD-13-STABLE  -  FREEBSD-13-0  -  FREEBSD-12-STABLE  -  FREEBSD-12-0  -  FREEBSD-11-STABLE  -  FREEBSD-11-0  -  FREEBSD-10-STABLE  -  FREEBSD-10-0  -  FREEBSD-9-STABLE  -  FREEBSD-9-0  -  FREEBSD-8-STABLE  -  FREEBSD-8-0  -  FREEBSD-7-STABLE  -  FREEBSD-7-0  -  FREEBSD-6-STABLE  -  FREEBSD-6-0  -  FREEBSD-5-STABLE  -  FREEBSD-5-0  -  FREEBSD-4-STABLE  -  FREEBSD-3-STABLE  -  FREEBSD22  -  l41  -  OPENBSD  -  linux-2.6  -  MK84  -  PLAN9  -  xnu-8792 
SearchContext: -  none  -  3  -  10 

    1                  Last update: 2005-01-17, version 1.4
    2 
    3 This file is maintained by H. Peter Anvin <unicode@lanana.org> as part
    4 of the Linux Assigned Names And Numbers Authority (LANANA) project.
    5 The current version can be found at:
    6 
    7             http://www.lanana.org/docs/unicode/unicode.txt
    8 
    9                        ------------------------
   10 
   11 The Linux kernel code has been rewritten to use Unicode to map
   12 characters to fonts.  By downloading a single Unicode-to-font table,
   13 both the eight-bit character sets and UTF-8 mode are changed to use
   14 the font as indicated.
   15 
   16 This changes the semantics of the eight-bit character tables subtly.
   17 The four character tables are now:
   18 
   19 Map symbol      Map name                        Escape code (G0)
   20 
   21 LAT1_MAP        Latin-1 (ISO 8859-1)            ESC ( B
   22 GRAF_MAP        DEC VT100 pseudographics        ESC ( 0
   23 IBMPC_MAP       IBM code page 437               ESC ( U
   24 USER_MAP        User defined                    ESC ( K
   25 
   26 In particular, ESC ( U is no longer "straight to font", since the font
   27 might be completely different than the IBM character set.  This
   28 permits for example the use of block graphics even with a Latin-1 font
   29 loaded.
   30 
   31 Note that although these codes are similar to ISO 2022, neither the
   32 codes nor their uses match ISO 2022; Linux has two 8-bit codes (G0 and
   33 G1), whereas ISO 2022 has four 7-bit codes (G0-G3).
   34 
   35 In accordance with the Unicode standard/ISO 10646 the range U+F000 to
   36 U+F8FF has been reserved for OS-wide allocation (the Unicode Standard
   37 refers to this as a "Corporate Zone", since this is inaccurate for
   38 Linux we call it the "Linux Zone").  U+F000 was picked as the starting
   39 point since it lets the direct-mapping area start on a large power of
   40 two (in case 1024- or 2048-character fonts ever become necessary).
   41 This leaves U+E000 to U+EFFF as End User Zone.
   42 
   43 [v1.2]: The Unicodes range from U+F000 and up to U+F7FF have been
   44 hard-coded to map directly to the loaded font, bypassing the
   45 translation table.  The user-defined map now defaults to U+F000 to
   46 U+F0FF, emulating the previous behaviour.  In practice, this range
   47 might be shorter; for example, vgacon can only handle 256-character
   48 (U+F000..U+F0FF) or 512-character (U+F000..U+F1FF) fonts.
   49 
   50 
   51 Actual characters assigned in the Linux Zone
   52 --------------------------------------------
   53 
   54 In addition, the following characters not present in Unicode 1.1.4
   55 have been defined; these are used by the DEC VT graphics map.  [v1.2]
   56 THIS USE IS OBSOLETE AND SHOULD NO LONGER BE USED; PLEASE SEE BELOW.
   57 
   58 U+F800 DEC VT GRAPHICS HORIZONTAL LINE SCAN 1
   59 U+F801 DEC VT GRAPHICS HORIZONTAL LINE SCAN 3
   60 U+F803 DEC VT GRAPHICS HORIZONTAL LINE SCAN 7
   61 U+F804 DEC VT GRAPHICS HORIZONTAL LINE SCAN 9
   62 
   63 The DEC VT220 uses a 6x10 character matrix, and these characters form
   64 a smooth progression in the DEC VT graphics character set.  I have
   65 omitted the scan 5 line, since it is also used as a block-graphics
   66 character, and hence has been coded as U+2500 FORMS LIGHT HORIZONTAL.
   67 
   68 [v1.3]: These characters have been officially added to Unicode 3.2.0;
   69 they are added at U+23BA, U+23BB, U+23BC, U+23BD.  Linux now uses the
   70 new values.
   71 
   72 [v1.2]: The following characters have been added to represent common
   73 keyboard symbols that are unlikely to ever be added to Unicode proper
   74 since they are horribly vendor-specific.  This, of course, is an
   75 excellent example of horrible design.
   76 
   77 U+F810 KEYBOARD SYMBOL FLYING FLAG
   78 U+F811 KEYBOARD SYMBOL PULLDOWN MENU
   79 U+F812 KEYBOARD SYMBOL OPEN APPLE
   80 U+F813 KEYBOARD SYMBOL SOLID APPLE
   81 
   82 Klingon language support
   83 ------------------------
   84 
   85 In 1996, Linux was the first operating system in the world to add
   86 support for the artificial language Klingon, created by Marc Okrand
   87 for the "Star Trek" television series.  This encoding was later
   88 adopted by the ConScript Unicode Registry and proposed (but ultimately
   89 rejected) for inclusion in Unicode Plane 1.  Thus, it remains as a
   90 Linux/CSUR private assignment in the Linux Zone.
   91 
   92 This encoding has been endorsed by the Klingon Language Institute.
   93 For more information, contact them at:
   94 
   95         http://www.kli.org/
   96 
   97 Since the characters in the beginning of the Linux CZ have been more
   98 of the dingbats/symbols/forms type and this is a language, I have
   99 located it at the end, on a 16-cell boundary in keeping with standard
  100 Unicode practice.
  101 
  102 NOTE: This range is now officially managed by the ConScript Unicode
  103 Registry.  The normative reference is at:
  104 
  105         http://www.evertype.com/standards/csur/klingon.html
  106 
  107 Klingon has an alphabet of 26 characters, a positional numeric writing
  108 system with 10 digits, and is written left-to-right, top-to-bottom.
  109 
  110 Several glyph forms for the Klingon alphabet have been proposed.
  111 However, since the set of symbols appear to be consistent throughout,
  112 with only the actual shapes being different, in keeping with standard
  113 Unicode practice these differences are considered font variants.
  114 
  115 U+F8D0  KLINGON LETTER A
  116 U+F8D1  KLINGON LETTER B
  117 U+F8D2  KLINGON LETTER CH
  118 U+F8D3  KLINGON LETTER D
  119 U+F8D4  KLINGON LETTER E
  120 U+F8D5  KLINGON LETTER GH
  121 U+F8D6  KLINGON LETTER H
  122 U+F8D7  KLINGON LETTER I
  123 U+F8D8  KLINGON LETTER J
  124 U+F8D9  KLINGON LETTER L
  125 U+F8DA  KLINGON LETTER M
  126 U+F8DB  KLINGON LETTER N
  127 U+F8DC  KLINGON LETTER NG
  128 U+F8DD  KLINGON LETTER O
  129 U+F8DE  KLINGON LETTER P
  130 U+F8DF  KLINGON LETTER Q
  131         - Written <q> in standard Okrand Latin transliteration
  132 U+F8E0  KLINGON LETTER QH
  133         - Written <Q> in standard Okrand Latin transliteration
  134 U+F8E1  KLINGON LETTER R
  135 U+F8E2  KLINGON LETTER S
  136 U+F8E3  KLINGON LETTER T
  137 U+F8E4  KLINGON LETTER TLH
  138 U+F8E5  KLINGON LETTER U
  139 U+F8E6  KLINGON LETTER V
  140 U+F8E7  KLINGON LETTER W
  141 U+F8E8  KLINGON LETTER Y
  142 U+F8E9  KLINGON LETTER GLOTTAL STOP
  143 
  144 U+F8F0  KLINGON DIGIT ZERO
  145 U+F8F1  KLINGON DIGIT ONE
  146 U+F8F2  KLINGON DIGIT TWO
  147 U+F8F3  KLINGON DIGIT THREE
  148 U+F8F4  KLINGON DIGIT FOUR
  149 U+F8F5  KLINGON DIGIT FIVE
  150 U+F8F6  KLINGON DIGIT SIX
  151 U+F8F7  KLINGON DIGIT SEVEN
  152 U+F8F8  KLINGON DIGIT EIGHT
  153 U+F8F9  KLINGON DIGIT NINE
  154 
  155 U+F8FD  KLINGON COMMA
  156 U+F8FE  KLINGON FULL STOP
  157 U+F8FF  KLINGON SYMBOL FOR EMPIRE
  158 
  159 Other Fictional and Artificial Scripts
  160 --------------------------------------
  161 
  162 Since the assignment of the Klingon Linux Unicode block, a registry of
  163 fictional and artificial scripts has been established by John Cowan
  164 <jcowan@reutershealth.com> and Michael Everson <everson@evertype.com>.
  165 The ConScript Unicode Registry is accessible at:
  166 
  167           http://www.evertype.com/standards/csur/
  168 
  169 The ranges used fall at the low end of the End User Zone and can hence
  170 not be normatively assigned, but it is recommended that people who
  171 wish to encode fictional scripts use these codes, in the interest of
  172 interoperability.  For Klingon, CSUR has adopted the Linux encoding.
  173 The CSUR people are driving adding Tengwar and Cirth into Unicode
  174 Plane 1; the addition of Klingon to Unicode Plane 1 has been rejected
  175 and so the above encoding remains official.

Cache object: b51f45fd7130aad961afb1e5149d93b6


[ source navigation ] [ diff markup ] [ identifier search ] [ freetext search ] [ file search ] [ list types ] [ track identifier ]


This page is part of the FreeBSD/Linux Linux Kernel Cross-Reference, and was automatically generated using a modified version of the LXR engine.