10. Strings and String Processing

10.10. From the Java Library: java.util.StringTokenizer

ONE OF THE most widespread string-processing tasks is that of breaking up a string into its components, or tokens. For example, when processing a sentence, you may need to break the sentence into its constituent words, which are considered the sentence tokens. When processing a name- password string, such as “boyd:14irXp”, you may need to break it into a name and a password. Tokens are separated from each other by one or more characters which is known as delimiters. Thus, for a sentence, white space, including blank spaces, tabs, and line feeds, serve as the delimiters. For the password example, the colon character serves as a delimiter.

Java’s java.util.StringTokenizer class is specially designed for breaking strings into their tokens (Fig. 7.17). When instantiated with a String parameter, a StringTokenizer breaks the string into to- kens, using white space as delimiters. For example, if we instantiated a StringTokenizer as in the code

,,


 

 

 

 

 

java.sun.com/j2se/1.5.0/docs/api/

 

 

 

 

 

 

 

 

 

J

it would break the string into the following tokens, which would be stored internally in the StringTokenizer in the order shown:

,,

 

 

 

 

 

Note that the period is part of the last token (“sentence.”). This is because punctuation marks are not considered delimiters by default.


J

 

Figure7.17:The java.util.StringTokenizer class.

 

If you wanted to include punctuation symbols as delimiters, you could use the second StringTokenizer() constructor, which takes a second String parameter (Fig. 7.17). The second parameter specifies a string of those characters that should be used as delimiters. For example, in the instantiation,

,,

 

 

J

various punctuation symbols (periods, commas, and so on) are included among the delimiters. Note that escape sequences (\b\t\n) are used to specify blanks, tabs, and newlines.

The hasMoreTokens() and nextToken() methods can be used to process a delimited string, one token at a time. The first method returns true as long as more tokens remain; the second gets the next token in the list. For example, here’s a code segment that will break a standard URL string into its constituent parts:

,,

 

 

 

J

This code segment will produce the following output:

,,

 

 

 

J

The only delimiters used in this case were the “:” and “/” symbols. And note that nextToken() does not return the empty string between “:” and “/” as a token.