CHAPTER 9 International Languages and Character Sets
319
• Operating system The client operating system has text displayed on
its interface, and may also process text.
For a satisfactory working environment, all these sources of text must work
together. Loosely speaking, they must all be working in the user’s language
and/or character set.
Code pages in Windows and Windows NT
Many languages have few enough characters to be represented in a single-byte
character set. In such a character set, each character is represented by a single
byte: a two-digit hexadecimal number.
At most, 256 characters can be represented in a single byte. No single-byte
character set can hold all of the characters used internationally, including
accented characters. This problem was addressed by the development of a set
of code pages, each of which describes a set of characters appropriate for one
or more national languages. For example, code page 869 contains the Greek
character set, and code page 850 contains an international character set suitable
for representing many characters in a variety of languages.
Upper and lower
pages
With few exceptions, characters 0 to 127 are the same for all the single-byte
code pages. The mapping for this range of characters is called the ASCII
character set. It includes the English language alphabet in upper and lower
case, as well as common punctuation symbols and the digits. This range is
often called the seven-bit range (because only seven bits are needed to
represent the numbers up to 127) or the lower page. The characters from 128
to 256 are called extended characters, or upper code-page characters, and
vary from code page to code page.
Problems with code page compatibility are rare if the only characters used are
from the English alphabet, as these are represented in the ASCII portion of
each code page (0 to 127). However, if other characters are used, as is generally
the case in any non-English environment, there can be problems if the database
and the application use different code pages.
Example
Suppose a database holding French language strings uses code page 850, and
the client operating system uses code page 437. The character À (upper case A
grave) is held in the database as character \xB7 (decimal value 183). In code
page 437, character \xB7 is a graphical character. The client application
receives this byte and the operating system displays it on the screen, the user
sees a graphical character instead of an A grave.