14. Files and Streams: Input/Output Techniques

14.2. Streams and Files

As was noted in Chapter 4, all input and output (I/O) in Java is accom- plished through the use of input streams and output streams. You are already familiar with input and output streams because we have rou- tinely used the System.out output stream and and the System.in in- put stream (Fig. 11.1) in this text’s examples. Recall that System.out usually connects your program (source) to the screen (destination) and System.in usually connects the keyboard (source) to the running pro- gram (destination). What you have learned about streams will also be a key for connecting files to a program.

 

Memory

 

 

 

 

 

 

 

Keyboard

 

The Data Hierarchy

Data, or information, are the contents that flow through Java streams or stored in files. All data are comprised of binary digits or bits. A bit is simply a 0 or a 1, the electronic states that correspond to these values. As we learned in Chapter 5, a bit is the smallest unit of data.

However, it would be tedious if a program had to work with data in units as small as bits. Therefore, most operations involve various-sized aggregates of data such as an 8-bit byte, a 16-bit short, a 16-bit char, a 32-bit int, a 64-bit long, a 32-bit float, or a 64-bit double. As we know, these are Java’s primitive numeric types. In addition to these aggregates, we can group together a sequence of char to form a String.

It is also possible to group data of different types into objects. A record, which corresponds closely to a Java object, can have fields that contain different types of data. For example, a student record might contain fields for the student’s name and address represented by (Strings), expected year of graduation (int), and current grade point average represented by (double). Collections of these records are typically grouped into files. For example, your registrar’s office may have a separate file for each of its graduating classes. These are typically organized into a collection of related files, which is called a database.


Database

 

 

 

 

Record Field

 

 

 

 

 

0 Bit


 

 

 

 

 

 

 

File

 

Taken together, the different kinds of data that are processed by a com-

puter or stored in a file can be organized into a data hierarchy (Fig. 11.2). It’s important to recognize that while we, the programmers, may

 

group data into various types of abstract entities, the information flowing through an input or output stream is just a sequence of bits. There are no natural boundaries that mark where one byte (or one int or one record) ends and the next one begins. Therefore, it will be up to us to provide the boundaries as we process the data.

Binary Files and Text Files

As we noted in chapter 4, there are two types of files in Java: binary files and text files. Both kinds store data as a sequence of bits—that is, a se- quence of 0’s and 1’s. Thus, the difference between the two types of files lies in the way they are interpreted by the programs that read and write them. A binary file is processed as a sequence of bytes, whereas a text file is processed as a sequence of characters.

Text editors and other programs that process text files interpret the file’s sequence of bits as a sequence of characters—that is, as a string.

Your Java source programs (*.java) are text files, and so are the HTML files that populate the World Wide Web. The big advantage of text files


Figure 11.2: The data hierarchy.

 

is their portability. Because their data are represented in the ASCII code Text files are portable

(Table 5.13), they can be read and written by just about any text-processing program. Thus, a text file created by a program on a Windows/Intel computer can be read by a Macintosh program.

In non-Java environments, data in binary files are stored as bytes, and the representation used varies from computer to computer. The manner in which a computer’s memory stores binary data determines how it is represented in a file. Thus, binary data are not very portable. For exam- ple, a binary file of integers created on a Macintosh cannot be read by a Windows/Intel program.

 

 

Binary files are platform dependent

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

I/O streams


One reason for the lack of portability is that each type of computer uses its own definition for how an integer is defined. On some systems an integer might be 16 bits, and on others it might be 32 bits, so even if you know that a Macintosh binary file contains integers, that still won’t make it readable by Windows/Intel programs. Another problem is that even if two computers use the same number of bits to represent an integer, they might use different representation schemes. For example, some com- puters might use 10000101 as the 8-bit representation of the number 133, whereas other computers might use the reverse, 10100001, to represent 133.

The good news for us is that Java’s designers have made its binary files platform independent by carefully defining the exact size and representation that must be used for integers and all other primitive types. Thus, binary files created by Java programs can be interpreted by Java programs on any platform.

 

JAVA LANGUAGE RULE

Platform Independence. Java binary files

are platform independent. They can be interpreted by any computer that supports Java.

 

Input and Output Streams

Java has a wide variety of streams for performing I/O. They are de- fined in the java.io package, which must be imported by any program that does I/O. They are generally organized into the hierarchy illustrated in Figure 11.3. We will cover only a small portion of the hierarchy in this text. Generally speaking, binary files are processed by subclasses of InputStream and OutputStream. Text files are processed by sub- classes of Reader and Writer, both of which are streams, despite their names.

InputStream and OutputStream are abstract classes that serve as the root classes for reading and writing binary data. Their most commonly used subclasses are DataInputStream and DataOutputStream, which are used for processing String data and data of any of Java’s prim- itive types—char, boolean, int, double, and so on. The analogues of these classes for processing text data are the Reader and Writer classes, which serve as the root classes for all text I/O.

 

JAVA PROGRAMMING TIP

Choosing a Stream. In choosing an

appropriate stream for an I/O operation, DataInputStreams and DataOutputStreams normally are used for binary I/O. Reader and Writer streams normally are used for text I/O.

 

The various subclasses of these root classes perform various specialized I/O operations. For example, FileInputStream and FileOutput- Stream are used for performing binary input and output on files. The PrintStream class contains methods for outputting various primitive data—integers, floats, and so forth—as text. The System.out stream, one of the most widely used output streams, is an object of this type. The PrintWriter class, which was introduced in JDK 1.1 contains the same

 

Figure 11.3: Java’s stream hierar- chy.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

methods as PrintStream but the methods are designed to support plat- form independence and internationalized I/O—that is, I/O that works in different languages and alphabets.

 

 

 

 

 

PrintWriter

 

+PrintWriter(in out : OutputStream)

+PrintWriter(in out : Writer)

+print(in i : int)

+print(in l : long)

+print(in f : float)

+print(in d : double)

+print(in s : String)

+print(in o : Object)

+println(in i : int)

+println(in l : long)

+println(in f : float)

+println(in d : double)

+println(in s : String)

+println(in o : Object)

 

 

Figure 11.4: PrintWriter meth- ods print data of various types.


The various methods defined in PrintWriter are designed to output a particular type of primitive data (Fig. 11.4). As you would expect, there is both a print() and println() method for each kind of data that the programmer wants to output.

Table 11.1 briefly describes Java’s most commonly used input and out- put streams. In addition to the ones we’ve already mentioned, you are already familiar with methods from the BufferedReader and File classes, which were used in Chapter 4.

Filtering refers to performing operations on data while the data are being input or output. Methods in the FilterInputStream and FilterReader classes can be used to filter binary and text data dur- ing input. Methods in the FilterOutputStream and FilterWriter can be used to filter output data. These classes serve as the root classes for various filtering subclasses. They can also be subclassed to perform customized data filtering.

One type of filtering is buffering, which is provided by several buffered streams, including BufferedInputStream and BufferedReader, for performing binary and text input, and BufferedOutputStream and BufferedWriter, for buffered output operations. As was discussed in

 

TABLE 11.1 Description of some of Java’s important stream classes.

ClassDescription

InputStreamAbstract root class of all binary input streams FileInputStreamProvides methods for reading bytes from a binary file FilterInputStreamProvides methods required to filter data BufferedInputStreamProvides input data buffering for reading large files ByteArrayInputStreamProvides methods for reading an array as if it were a stream DataInputStreamProvides methods for reading Java’s primitive data types PipedInputStreamProvides methods for reading piped data from another thread OutputStreamAbstract root class of all binary output streams FileOutputStreamProvides methods for writing bytes to a binary file FilterOutputStreamProvides methods required to filter data BufferedOutputStreamProvides output data buffering for writing large files ByteArrayOutputStream Provides methods for writing an array as if it were a stream DataOutputStreamProvides methods for writing Java’s primitive data types PipedOutputStreamProvides methods for writing piped data to another thread PrintStreamProvides methods for writing primitive data as text

ReaderAbstract root class for all text input streams BufferedReaderProvides buffering for character input streams CharArrayReaderProvides input operations on char arrays FileReaderProvides methods for character input on files

FilterReaderProvides methods to filter character input

StringReaderProvides input operations on Strings

WriterAbstract root class for all text output streams BufferedWriterProvides buffering for character output streams CharArrayWriterProvides output operations to char arrays FileWriterProvides methods for output to text files

FilterWriterProvides methods to filter character output PrintWriterProvides methods for printing binary data as characters StringWriterProvides output operations to Strings

 

chapter 4, a buffer is a relatively large region of memory used to temporar- ily store data while they are being input or output. When buffering is used, a program will transfer a large number of bytes into the buffer from the relatively slow input device and then transfer these to the program as each read operation is performed. The transfer from the buffer to the program’s memory is very fast.

Similarly, when buffering is used during output, data are transferred directly to the buffer and then written to the disk when the buffer fills up or when the flush() method is called.

You can also define your own data filtering subclasses to perform cus-Buffering

tomized filtering. For example, suppose you want to add line numbers to a text editor’s printed output. To perform this task, you could define a

FilterWriter subclass and override its write() methods to performFiltering data

the desired filtering operation. Similarly, to remove the line numbers from such a file during input, you could define a FilterReader subclass. In that case, you would override its read() methods to suit your goals for the program.

There are several classes that provide I/O-like operations on various internal memory structures. ByteArrayInputStream, ByteArray- OutputStream, CharArrayReader, and CharArrayWriter are four classes that take input from or send output to arrays in the program’s memory. Methods in these classes can be useful for performing vari- ous operations on data during input or output. For example, suppose a program reads an entire line of integer data from a binary file into a ByteArray. It might then transform the data by, say, computing the remainder modulo N of each value. The program now can read these transformed data by treating the byte array as an input stream. A similar example would apply for some kind of output transformation.

The StringReader and StringWriter classes provide methods for treating Strings and StringBuffers as I/O streams. These methods can be useful for performing certain data conversions.