10. Strings and String Processing

10.9. Comparing Strings

 

Comparing strings is another important task. For example, when a word processor performs a search and replace operation, it needs to identify strings in the text that match the target string.

Strings are compared according to their lexicographic order—that is, the order of their characters. For the letters of the alphabet, lexicographic or- der just means alphabetical order. Thus, a comes before b and d comes after c. The string “hello” comes before “jello” because h comes before j in the alphabet.

For Java and other programming languages, the definition of lexico- graphic order is extended to cover all the characters that make up the character set. We know, for example, that in Java’s Unicode character set the uppercase letters come before the lowercase letters (Table 5.13). So, the letter H comes before the letter h and the letter Z comes before the letter a.

Lexicographic order can be extended to include strings of characters. H precedes h

Thus, “Hello” precedes “hello” in lexicographic order because its first let- ter, H, precedes the first letter, h, in “hello.” Similarly, the string “Zero” comes before “aardvark,” because Z comes before a. To determine lexico- graphic order for strings, we must perform a character-by-character com- parison, starting at the first character and proceeding left to right. As an example, the following strings are arranged in lexicographic order:

,,

 

J

 

We can define lexicographic order for strings as follows:

Perhaps a more precise way to define lexicographic order is to define a Java method:

,,

 

 

 

 

 

 

 

 

 

 

 

Algorithm: Loop bound


J

This method does a character-by-character comparison of the two strings, proceeding left to right, starting at the first character in both strings. Its for loop uses a counting bound, which starts at k equal to zero and counts up to the length of the shorter string. This is an impor- tant point in designing this algorithm. If you don’t stop iterating when you get past the last character in a string, your program will generate a StringIndexOutOfBounds exception. To prevent this error, we need to use the shorter length as the loop bound.

Note that the loop will terminate early if it finds that the respective characters from s1 and s2 are unequal. In that case, s1 precedes s2 if s1’s kth character precedes s2’s. If the loop terminates normally, that means that all the characters compared were equal. In that case, the shorter string precedes the longer. For example, if the two strings were “alpha” and “alphabet,” then the method would return true, because “alpha” is shorter than “alphabet.”

 

SELF-STUDY EXERCISES

EXERCISE 7.14 Arrange the following strings in lexicographic order:

 

,,

 

J

EXERCISE 7.15 Modify the precedes() method so that it will also return true when s1 and s2 are equal—for example, when s1 and s2 are both “hello”.

 

Object Identity Versus Object Equality

 

 

 

 

 

 

Java provides several methods for comparing Strings:

,,

. e q u a l s ( )

 

J

The first comparison method, equals(), overrides the Object.equals() method. Two Strings are equal if they have the exact same letters in the

exact same order. Thus, for the following declarations,Equality vs. identity

,,

 

 

J

s1.equals(s2) is false, but s1.equals("hello") is true.

You have to be careful when using Java’s equals() method. Accord- ing to the default definition of equals(), defined in the Object class, “equals” means “identical.” Two Objects are equal only if their names are references to the same object.


VenusThe morning star

 

This is like the old story of the morning star and the evening star,

which were thought to be different objects before it was discovered that

 

both were just the planet Venus. After the discovery, it was clear that “the morning star” and “the evening star” and “Venus” were just three different references to one and the same object (Fig. 7.12).

We can create an analogous situation in Java by using the following

JButton definitions:


Figure 7.12: Venus is the morning star, so “Venus” and “the morn- ing star” are two references to the same object.

 

,,

b1

 

 

J


b2 b3


(References)

 

 

 

(Labeled buttons)

 

Given these three declarations, b1.equals(b2) and b1.equals(b3) would be false, but b2.equals(b3) would be true because b2 and b3 are just two names for the same object (Fig. 7.13). So, in this case, “equals”


JButton b1=new JButton("a"); JButton b2=new JButton("a"); JButton b3=b2;

 

really means “identical.”

Moreover, in Java, when it is used to compare two objects, the

 

equality operator (==) is interpreted in the same way as the default Object.equals() method. So, it really means object identity. Thus, b1 == b2 would be false, because b1 and b2 are different objects, but b2 == b3 would be true because b2 and b3 refer to the same object.

These points are illustrated in the program shown in Figure 7.14. This

program uses methods isEquals() and isIdentical() to perform


Figure 7.13: For most objects, equality means identity. JBut- tons b2 and b3 are identical (and, hence, equal), but JButtons b1 and b2 are not identical (and, hence, unequal).

 

,

import j ava . awt . ;

public c l a s s Test Equals

s t a t i c Button b1 = new Button ( ”a” ) ; s t a t i c Button b2 = new Button ( ”b” ) ; s t a t i c Button b3 = b2 ;

 

private s t a t i c void is Equal ( Object o1 , Object o2 )

i f ( o1 . equals ( o2 ) )

System . out . p r i n t l n ( o1 . t o S t r i n g ( ) + equals + o2 . t o S t r i n g ( ) ) ;

e ls e

System . out . p r i n t l n ( o1 . t o S t r i n g ( ) + does NOT equal +

o2 . t o S t r i n g ( ) ) ;

} // i s E q u a l ( )

private s t a t i c void i s I d e n t i c a l ( Object o1 , Object o2 )

i f ( o1 == o2 )

System . out . p r i n t l n ( o1 . t o S t r i n g ( ) + i s i d e n t i c a l to +

o2 . t o S t r i n g ( ) ) ;

e ls e

System . out . p r i n t l n ( o1 . t o S t r i n g ( ) + i s NOT i d e n t i c a l to +

o2 . t o S t r i n g ( ) ) ;

} // i s I d e n t i c a l ( )

public s t a t i c void main ( S t r i n g argv [ ] )

is Equal ( b1 , b2 ) ;// n o t e q u a l

is Equal ( b1 , b3 ) ;// n o t e q u a l

is Equal ( b2 , b3 ) ;// e q u a l

 

i s I d e n t i c a l ( b1 , b2 ) ;// n o t i d e n t i c a l i s I d e n t i c a l ( b1 , b3 ) ;// n o t i d e n t i c a l i s I d e n t i c a l ( b2 , b3 ) ;// i d e n t i c a l

} // m a i n ( )

// T e s t E q u a l s

\

Figure 7.14: The TestEquals program tests Java’s default equals()

method, which is defined in the Object class.

 

the comparisons and print the results. This program will produce the following output:

,

 

 

 

 

String objects,


String Identity Versus String Equality

In comparing Java Strings, we must be careful to distinguish between object identity and string equality. Thus, consider the following declara- tions, which create the situation shown in Figure 7.15.

 

entity are differ-


"hello" s5s6s1s4s2


"Hello"

 

nct (nonidentical) re equal if they string value. So s5, and s6 are

s3

 

1 and s4 are iden-


String s1=new String ("hello");

 

strings s5 and s6.


String s2=new String ("hello");

 

String s3=new String ("Hello"); String s4=s1;

String s5="hello"; String s6="hello";

 

,,

 

 

 

 

 

Given these declarations, we would get the following results if we com- pare the equality of the Strings:


Equality vs. identity

J

 

 

 

 

\J

and the following results if we compare their identity:

 

 

 

\J

The only true identities among these Strings are s1 and s4, and s5 and s6. In the case of s5 and s6, both are just references to the literal string, “hello”, as we described in Section 7.2. The program in Figure 7.16 illustrates these points.

SELF-STUDY EXERCISES

EXERCISE 7.16 Given the String declarations,

,,

 

 

J

evaluate the following expressions:

 

,

import j ava . awt . ;

public c l a s s Test String Equals

s t a t i c S t r i n g s 1 = new S t r i n g ( h e l l o ) ; // s 1 a n d s 2 a r e e q u a l , n o t

s t a t i c S t r i n g s 2 = new S t r i n g ( h e l l o ) ;

s t a t i c S t r i n g s 3 = new S t r i n g ( Hello ) ; // s 1 a n d s 3 a r e n o t e q u a l

s t a t i c S t r i n g s 4 = s 1 ;// s 1 a n d s 4 a r e i d e n t i c a l

s t a t i c S t r i n g s 5 = h e l l o ;// s 1 a n d s 5 a r e n o t i d e n t i c

s t a t i c S t r i n g s 6 = h e l l o ;// s 5 a n d s 6 a r e i d e n t i c a l

 

private s t a t i c void t e s t Eq ua l ( S t r i n g s t r 1 , S t r i n g s t r 2 )

i f ( s t r 1 . equals ( s t r 2 ) )

System . out . p r i n t l n ( s t r 1 + equals + s t r 2 ) ;

e ls e

System . out . p r i n t l n ( s t r 1 + does not equal + s t r 2 ) ;

} // t e s t E q u a l ( )

private s t a t i c void t e s t I d e n t i c a l ( S t r i n g s t r 1 , S t r i n g s t r 2 )

i f ( s t r 1 == s t r 2 )

System . out . p r i n t l n ( s t r 1 + i s i d e n t i c a l to + s t r 2 ) ;

e ls e

System . out . p r i n t l n ( s t r 1 + i s not i d e n t i c a l to + s t r 2 ) ;

} // t e s t I d e n t i c a l ( )

public s t a t i c void main ( S t r i n g argv [ ] ) t e s t Eq ua l ( s1 , s 2 ) ;// e q u a l

t e s t Eq ua l ( s1 , s 3 ) ;// n o t e q u a l

t e s t Eq ua l ( s1 , s 4 ) ;// e q u a l

t e s t Eq ua l ( s1 , s 5 ) ;// e q u a l

t e s t Eq ua l ( s5 , s 6 ) ;// e q u a l

 

t e s t I d e n t i c a l ( s1 , s 2 ) ;// n o t i d e n t i c a l t e s t I d e n t i c a l ( s1 , s 3 ) ;// n o t i d e n t i c a l t e s t I d e n t i c a l ( s1 , s 4 ) ;// i d e n t i c a l

t e s t I d e n t i c a l ( s1 , s 5 ) ;// n o t i d e n t i c a l

t e s t I d e n t i c a l ( s5 , s 6 ) ;// i d e n t i c a l

} // m a i n ( )

// T e s t S t r i n g E q u a l s

Program Output h e l l o equals h e l l o

h e l l o does not equal Hello h e l l o equals h e l l o

h e l l o equals h e l l o h e l l o equals h e l l o

h e l l o i s not i d e n t i c a l to h e l l o h e l l o i s not i d e n t i c a l to Hello h e l l o i s i d e n t i c a l to h e l l o

h e l l o i s not i d e n t i c a l to h e l l o h e l l o i s i d e n t i c a l to h e l l o

\

Figure 7.16: Program illustrating the difference between string equality and identity.

 

SECTION 7.10 From the Java Library: java.util.StringTokenizer 323

 

s1 == s2

s1.equals(s2)

s1 == s3


s1.equals(s3)

s2 == s3

s2.equals(s4)


s2 == s4

s1 == s5

s4 == s5

 

 

EXERCISE 7.17Why are the variables in TestStringEquals de- clared static?

EXERCISE 7.18Given the following declarations,

,,

 

J

write Java expressions to carry out each of the following operations:

Swap the front and back half of s1 giving a new string.

Swap ”world” and ”hello” in s2 giving a new string.

Combine parts of s1 and s2 to create a new string ”hello abc”.