The Unicode Standard, Version 4.0 (英語) ハードカバー – 2003/8/29
Kindle 端末は必要ありません。無料 Kindle アプリのいずれかをダウンロードすると、スマートフォン、タブレットPCで Kindle 本をお読みいただけます。
The Unicode Consortium is a non-profit organization founded to develop, extend, and promote the use of the Unicode Standard. The membership of the Consortium represents a broad spectrum of corporations and organizations in the computer and information processing industry. The Unicode Consortium actively cooperates with many of the leading standards development organizations, including ISO/IEC JTC1, W3C, IETF, and ECMA.
ただ、あくまで参考とするべきかと思います。The Unicode Standard 4.0はまだ実装は(恐らく)されてませんし、世界標準としては、ISO/IEC 10646-1および10646-2を見たほうが良いといえます。
The content of ISO standard 10646 (successor to 8-bit ISO 646), goes way beyond just a charcter set. It contains information critical to the correctness of any program that steps outside the English-language world, i.e. every program on the Internet, and many others sooner or later. This is the basis for correct handling of numerals (there's a lot more than 0 to 9), letters, and text. It's also the explanation for some program behaviors that might otherwise baffle a programmer, or at least a programmer with the wit to be baffled.
More than just crucial, the content of this standard is plain fun. Its snippets of information from every major world language give wonderful insight into how people express themselves. It drives home the delighful diversity of human language and experience. It's also a near-bottomless source of stump-your-friends trivia.
I admit, I'll never use every fact in this incredible assembly. I use a lot of the information, though, and I use it as the point of entry into every discussion of internationalization and localization of software.
But chances are, when you deal with Unicode, you only deal with a subset. Often only a small subset at that, unless you are using Chinese/Japanese. Typically you work with ascii and the codes for your spoken language if that is not a Western European language. Very few of us deal with much more than this.
Which illustrates the appeal of the book. The Big Picture. ALL of Unicode. The breadth is stunning. It shows the written form of every major spoken language and many minor ones. Has the pictograms for Chinese [of course]. But also the symbols for Khmer, Canadian Aboriginal, Tamil, Syraic, et cetera, et cetera. Thumbing through this, you may encounter languages that you did not even know existed. It is one thing to say that we live in a multilingual world. But it is another to actually see it expressed comprehensively at the most basic level.
There are two audiences for this book. The first is any computer person who has to deal with issues of internationalisation.
But another audience is every Department of Languages or Cultural Anthropology in a university. If this describes your background, then you should know that you do not need facility in computing to appreciate the significance of this book. You can use it as a standard reference, akin to the Oxford English Dictionary vis-a-vis the English language. Look, ignore the computer stuff in the text. Yes, you can do this. The book groups related languages into common chapters. The explanatory text is lucid and the graphics for the languages lets you easily cross compare. Of course, at a higher level of meaning like sentences, you will need specialised texts in those languages. But to understand a language, you need to start at its letters or pictograms.
Think of this book as an index into all the languages of man.
Browse through the book just like you would in a bookstore or library. Print out parts of it or all of it for free if you want. Well, it is free if you don't count the cost of paper (about 1500 sheets or twice that for simplex printing), cost of a binder (or maybe two binders) and the time you would have to spend punching the holes.
If you are mainly or only interested in particular sections of the standard then printing only those sections may be a reasonable thing to do.
On the other hand the price is *very* reasonable for an 8½" × 11" hardbound book with 1,462 pages. If it's the sort of book you know you want for browsing and for reference then it is likely you will want it in this nicely bound copy.
Like the previously published versions of the Unicode standard, this book is a beautiful book that is useful to those who don't need or want to get into the technical details of character properties and rules for bi-directional display and other necessary rules for displaying the characters. But for the actual use of many characters you will have to consult other lists outside the Unicode book or files, e.g. dictionaries and grammars of various languages or explanations of symbols used in various fields of mathematics.
Language and writing systems are messy and inconsistant and handling them systematically and coherently cannot be made easy. Accordingly the rules and explanations in this standard are by necessity often long and involved and couched in technical language. It can't be avoided that, for example, one must sometimes distinguish carefully between _characters_, _glyphs_, _graphemes_, _grapheme clusters_, _ligatures_ and _digraphs_ and whether one character is a _canonical equivalent_ of another character or sequence of characters or a _compatibility equivalent_ of another character or sequence of characters or just similar to another character or sequence of characters.
The Unicode character set is still a work in progress. Version 4.0 may not even approach the half-way mark in encoding every character that has been used in normal text records by human beings for which a meaning is known. No-one has ever tried to produce a list of characters on this scale before. No-one yet knows how many distinct characters there are.
But 4.0 covers 96,382 characters from *almost* every script currently used for modern languages and from some ancient scripts as well including Ugaritic cuneiform, Cretan Linear B and the ancient Cypriot syllabary. (Sumerian/Akkadian cuneiform is being worked on and Egyptian hieroglyphics will eventually follow.)
Included are a plethora of technical symbol characters including mathematical characters, chess pieces, die faces, characters needed for modern western music notation, characters needed for Byzantine music notation, ornamental dingbats and so much more. All of it is now at the fingertips of every computer user -- that is if fonts that contain the characters are installed.
Finding fonts that display some of these characters is still a problem. :-(
But it would be a worse problem if these characters weren't assigned to a common character set. The past practice of numerous special fonts for various symbols and scripts which disagreed with one another on how the characters were encoded produced a horrible mess.
Large as it is, with 40% more pages than version 3.0, the book doesn't contain the whole standard. Increasingly as the standard has expanded tabular material has been dropped from the printed volumes and replaced with references to data files available on the website or on the CD that comes with the book.
The end of section 3.2 specifies six files found as Annexes on the website and on the CD which "are essential parts of version 4.0" including an explanation of the bidirectional algorithm which appeared in the printed text for earlier releases. And there are many mentions in the printed standard of other files available on the CD or website. A binder containing printouts of this material is necessary if you want a truly complete hardcopy of the entire 4.0 standard.
Unfortunately the 4.0 HTML files are carelessly laid down on the CD with external links pointing to files on the Unicode website and not to the corresponding files on the CD. Graphics are sometimes missing though the only file I think this matters with is StandardizedVariants.html which has a number of variant character images. (The data in this short file should have been in the book).
If you work online you probably won't notice anything wrong but you also are likely not to notice that after clicking on a link you are viewing a file from the Unicode website instead of a file on the CD. That may matter in the future if you need to reference a 4.0 file and don't observe that the file you are actually looking at is from the website and is a "latest version" file that has been updated beyond 4.0. If you are working offline you can avoid this, but it is annoying to have to manually search for the file by name because the link fails.
Also, although the Readme.txt file on the CD mentions "mapping tables" and files with "the extension .UNI", these useful conversion tables which were included on the CD's with previous releases are missing on the 4.0 CD. But they are available on the website.
This is a minor caveat. I suspect most people will use the website in any case rather than the CD.