22. The ASCII and Unicode Character Sets

 

Java uses version 2.0 of the Unicode character set for representing character data. The Unicode set represents each character as a 16-bit unsigned integer. It can, therefore, represent 216 = 65,536 different characters. This enables Unicode to represent characters from not only English but also a wide range of international languages. For details about Unicode, see

,,

 

J

Unicode supersedes the ASCII character set (American Standard Code for In-

formation Interchange). The ASCII code represents each character as a 7-bit or 8-bit unsigned integer. A 7-bit code can represent only 27 = 128 characters. In order to make Unicode backward compatible with ASCII, the first 128 characters of Unicode have the same integer representation as the ASCII characters.

Table C.1 shows the integer representations for the printable subset of ASCII characters. The characters with codes 0 through 31 and code 127 are nonprintable characters, many of which are associated with keys on a standard keyboard. For example, the delete key is represented by 127, the backspace by 8, and the return key by 13.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

819

 

 

 

 

 

 

 

 

 

 

 

 

 

TABLE C.1 ASCII codes for selected characters

,,

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

J

 

 

 

 

 

 

 

Appendix D