8. Java Data and Operators

8.7. Character Data and Operators

Another primitive data type in Java is the character type, char. A char- acter in Java is represented by a 16-bit unsigned integer. This means that

a total of 216 or 65536 different Unicode characters can be represented,Unicode

corresponding to the integer values 0 to 65535. The Unicode character set is an international standard that has been developed to enable computer languages to represent characters in a wide variety of languages, not just English. Detailed information about this encoding can be obtained at

,,

 

J

It is customary in programming languages to use unsigned integers to represent characters. This means that all the digits (0, . . . , 9), alphabetic let- ters (a, . . . , z, A,..., Z), punctuation symbols (such as . ; , “ ‘’ ! -), and non- printing control characters (LINE FEED, ESCAPE, CARRIAGE RETURN,

. . .) that make up the computer’s character set are represented in the com- puter’s memory by integers. A more traditional set of characters is the

 

ASCII (American Standard Code for Information Interchange) character set. ASCII cod

ASCII is based on a 7-bit code and, therefore, defines 27 or 128 different characters, corresponding to the integer values 0 to 127. In order to make Unicode backward compatible with ASCII systems, the first 128 Unicode characters are identical to the ASCII characters. Thus, in both the ASCII and Unicode encoding, the printable characters have the integer values shown in Table 5.13.

 

TABLE 5.13 ASCII codes for selected characters

,,

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

J

 

 

Character to Integer Conversions

Is ‘A’ a character or an integer? The fact that character data are stored as integers in the computer’s memory can cause some confusion about whether a given piece of data is a character or an integer. In other words, when is a character, for example ‘A’, treated as the integer (65) instead of as the character ‘A’? The rule in Java is that a character literal—‘a’ or ‘A’ or ‘0’ or ‘?’—is always treated as a character, unless we explicitly tell Java to treat it as an integer. So if we display a literal’s value

,,

 

J

 

the letter ‘a’ will be displayed. Similarly, if we assign ‘a’ to a char variable and then display the variable’s value,

 

 

\J

the letter ‘a’ will be shown. If, on the other hand, we wish to output a character’s integer value, we must use an explicit cast operator as follows:

,,

 

J

A cast operation, such as (int), converts one type of data (’a’) into an- other (97). This is known as a type conversion. Similarly, if we wish to store a character’s integer value in a variable, we can cast the char into an int as follows:

 

 

 

\J

As these examples show, a cast is a type conversion operator. Java al- The cast operator

lows a wide variety of both explicit and implicit type conversions. Cer- tain conversions (for example, promotions) take place when methods are invoked, when assignment statements are executed, when expressions are evaluated, and so on.

Type conversion in Java is governed by several rules and exceptions. In some cases Java allows the programmer to make implicit cast conversions. For example, in the following assignment a char is converted to an int even though no explicit cast operator is used:

,,

 

 

J

Java permits this conversion because no information will be lost. A char- Implicit type conversion

acter char is represented in 16 bits whereas an int is represented in 32 bits. This is like trying to put a small object into a large box. Space will be left over, but the object will fit inside without being damaged. Similarly, storing a 16-bit char in a 32-bit int will leave the extra 16 bits unused.

This widening primitive conversion changes one primitive type (char) into

a wider one (int), where a type’s width is the number of bits used in itsWidening conversion

representation.

On the other hand, trying to assign an int value to a char variable leads to a syntax error:

,,

 

 

J

Trying to assign a 32-bit int to 16-bit char is like trying to fit a big object into an undersized box. The object won’t fit unless we shrink it in some

 

way. Java will allow us to assign an int value to a char variable, but only if we perform an explicit cast on it:

,,

 

 

 

 

 

Narrowing conversion


J

The (char) cast operation performs a careful “shrinking” of the int by lopping off the last 16 bits of the int. This can be done without loss of information provided that k’s value is in the range 0 to 65535—that is, in the range of values that fit into a char variable. This narrowing primitive conversion changes a wider type (32-bit int) to a narrower type (16-bit char). Because of the potential here for information loss, it is up to the programmer to determine that the cast can be performed safely.

 

 

 

The cast operator can be used with any primitive type. It applies to the variable or expression that immediately follows it. Thus, parentheses must be used to cast the expression m + n into a char:

,,

 

J

The following statement would cause a syntax error because the cast operator would only be applied to m:

,,

 

J

In the expression on the right-hand side, the character produced by (char)m will be promoted to an int because it is part of an integer oper- ation whose result will still be an int. Therefore, it cannot be assigned to a char without an explicit cast.

SELF-STUDY EXERCISE

Lexical Ordering

The order in which the characters of a character set are arranged, their lexical order, is an important feature of the character set. It especially comes into play for such tasks as arranging strings in alphabetical order.

Although the actual integer values assigned to the individual char- acters by ASCII and UNICODE encoding seem somewhat arbitrary, the characters are, in fact, arranged in a particular order. For example, note that various sequences of digits, ’0’...’9’, and letters, ’a’...’z’ and ’A’...’Z’, are represented by sequences of integers (Table 5.11). This makes it possible to represent the lexical order of the characters in terms of the less than relationship among integers. The fact that ‘a’ comes before ‘f’ in alphabetical order is represented by the fact that 97 (the integer code for ‘a’) is less than 102 (the integer code for ‘f’). Similarly, the digit ‘5’ comes before the digit ‘9’ in an alphabetical sequence because 53 (the integer code for ‘5’) is less than 57 (the integer code for ‘9’).

 

SECTION 5.8 Example: Character Conversions235

This ordering relationship extends throughout the character set. Thus, it is also the case that ‘A’ comes before ‘a’ in the lexical ordering because 65 (the integer code for ‘A’) is less than 97 (the integer code for ‘a’). Similarly, the character ‘[’ comes before ‘}’ because its integer code (91) is less than 125, the integer code for ‘}’.

 

Relational Operators

Given the lexical ordering of the char type, the following relational oper- ators can be defined: <, >, <=, >=, ==, !=. Given any two characters, ch1 and ch2, the expression ch1 < ch2 is true if and only if the integer value

of ch1 is less than the integer value of ch2. In this case we say that ch1 char relations precedes ch2 in lexical order. Similarly, the expression ch1 > ch2 is true if

and only if the integer value of ch1 is greater than the integer value of ch2. In this case we say that ch1 follows ch2. And so on for the other relational operators. This means that we can perform comparison operations on any two character operands (Table 5.14).

 

 

TABLE 5.14 Relational operations on characters

Operation

Operator

JavaTrue Expression

Precedes

<

ch1 < ch2a < b

Follows

>

ch1 > ch2c > a

Precedes or equals

<=

ch1 <= ch2a <= a

Follows or equals

>=

ch2 >= ch1a >= a

Equal to

==

ch1 == ch2a == a

Not equal to! =ch1 ! = ch2a ! = b