Friday, October 26, 2007

Making BufferedReader Iterable

I was reading recently about a issues around reading an entire file into a string. I pretty much never do that, since I like my code to be robust even in the face of very large files. But it did get me thinking about reading in lines of a file one at a time. I do have reason to do that on occasion.

Originally, I might have done it something like this:

File file = ...
BufferedReader br = new BufferedReader(new FileReader(file));
String line = br.readLine();
while ( line != null ) {
// Do something with line
line = br.readLine();
}

That's not bad, but I've been working with Ruby lately and really like the terse yet easy-to-read nature of the code. We can improve the above code with a for loop:

File file = ...
BufferedReader br = new BufferedReader(new FileReader(file));
for (String line = br.readLine(); line != null; line = br.readLine()) {
// Do something with line
}

Better, but still not great. Now, what if the BufferedReader were Iterable? It isn't, but we can wrap an Iterable around it. First, let's look at usage:

File file = ...
BufferedReaderIterable bri = new BufferedReaderIterable(new BufferedReader(new FileReader(file)));
for (String read : bri) {
// Do something with the line
}

Very nice. So, all we need is to implement BufferedReaderIterable. Happily, it's quite simple:

public class BufferedReaderIterable implements Iterable<String> {

private Iterator<String> i;

public BufferedReaderIterable( BufferedReader br ) {
i = new BufferedReaderIterator( br );
}
public Iterator iterator() {
return i;
}

private class BufferedReaderIterator implements Iterator<String> {
private BufferedReader br;
private java.lang.String line;

public BufferedReaderIterator( BufferedReader aBR ) {
(br = aBR).getClass();
advance();
}

public boolean hasNext() {
return line != null;
}

public String next() {
String retval = line;
advance();
return retval;
}

public void remove() {
throw new UnsupportedOperationException("Remove not supported on BufferedReader iteration.");
}

private void advance() {
try {
line = br.readLine();
}
catch (IOException e) { /* TODO */}
}
}
}

It would be nice if we could make the construction cleaner. What if BufferedReaderIterable had a constructor that took a File?


public class BufferedReaderIterable implements Iterable<String> {

private BufferedReader mine;
private Iterator<String> i;

public BufferedReaderIterable( BufferedReader br ) {
i = new BufferedReaderIterator( br );
}

public BufferedReaderIterable( File f ) throws FileNotFoundException {
mine = new BufferedReader( new FileReader( f ) );
i = new BufferedReaderIterator( mine );
}
public Iterator<String> iterator() {
return i;
}

private class BufferedReaderIterator implements Iterator<String> {
private BufferedReader br;
private String line;

public BufferedReaderIterator( BufferedReader aBR ) {
(br = aBR).getClass();
advance();
}

public boolean hasNext() {
return line != null;
}

public String next() {
String retval = line;
advance();
return retval;
}

public void remove() {
throw new UnsupportedOperationException("Remove not supported on BufferedReader iteration.");
}

private void advance() {
try {
line = br.readLine();
}
catch (IOException e) { /* TODO */}
if ( line == null && mine != null ) {
try {
mine.close();
}
catch (IOException e) { /* Ignore - probably should log an error */ }
mine = null;
}
}
}
}

Now, usage is even cleaner:

File file = ...
BufferedReaderIterable bri = new BufferedReaderIterable(file);
for (String read : bri) {
// Do something with the line
}

[Editorial Note: Updated to put the template angle brackets back in after Eric Burke noticed Blogger was eating them]

17 comments:

Laird Nelson said...

Hey, what's your (br = aBR).getClass() construct intended for? Just curious.

Unknown said...

Something like this should be part of java.io, IMHO. Is blogger eating "less-than" characters? I was wondering if this should be Iterable<String> ?

Brian Gilstrap said...

(br = aBR).getClass() is a fast way to "fail fast". It's not an option to have a null BufferedReader and work correctly. So, check that it's not null right up front, and do it as concisely as possible.

I used to do this:

public BufferedReaderIterator( BufferedReader aBR ) {
if ( aBR == null ) {
throw new NullPointerException( "..." );
}
br = aBR;

Then I moved to this:

public BufferedReaderIterator( BufferedReader aBR ) {
aBR.getClass(); // Assure non-null
br = aBR;
}

But then I realize we can do it much more concisely with:

public BufferedReaderIterator( BufferedReader aBR ) {
(br = aBR).getClass();
}

Brian Gilstrap said...

I agree with Eric. This should be part of the JDK. But at least we can work around it pretty easily.

And thanks for pointout that Blogger was eating the template stuff. I edited the post to reclaim the angle brackets.

Laird Nelson said...

I like it. You should probably have a way to close() the BufferedReader (or its wrapping Iterator) so that if someone decides to stop iterating in the middle you don't leak memory.

I'm guessing the reason BufferedReader was not made to be an Iterable<String> was: what makes iterating over a String more primal than iterating over chars, which all readers must be able to iterate over? Not supporting or defending this point of view; just saying.

Michael Easter said...

+1 on the post. Neat example of how another language can influence/motivate.

re: java.io Maybe but certainly in something like Jakarta Commons!

-1 on the null guard (the (br = aBR).getClass() trick). I know it is not entirely relevant to the post, but I find it interesting.

1. why not throw an IllegalArgumentException? That would illustrate a breach of contract versus a nebulous programming error (on the part of the client or this class).

2. The getClass() trick doesn't work when used via Groovy! Groovy creates a NullObject and its weird mojo would let that pass. In other circumstances, one might argue that this is a problem with Groovy but in the new age of JVM tunnelers, my guess is that we Java types will have to be very careful with idioms like this.

ps. The idiom works with JRuby and I think Jython. Note that an explicit check for null would work with everything.

Michael Easter said...

Oops! I double-checked and was burned by Groovy.

For (br = aBR).getClass()

(1) If this occurs in a Java class and is used by Groovy, then it still works. My mistake.

(2) If one tries this in pure Groovy, it won't work because of all the Groovy pixie dust.

I want to clarify because I thought #1 was breaking. I was mentally writing a blog post on this behaviour, actually, but then double-checked.

Apologies! I'm still -1 on it, but it does not suddenly 'break' when used elsewhere.

Anonymous said...

Reading the whole content of a file into a String is not that hard in java. You do not have to use a BufferedReader:

public static String readContent(File file) throws IOException {
FileInputStream in = new FileInputStream(file);
byte[] buffer = new byte[in.available()];
in.read(buffer);
return new String(buffer);
}

Brian Gilstrap said...
This comment has been removed by the author.
Brian Gilstrap said...

Indeed, reading the whole file in is not hard. It is, however, a bad idea from the perspective of robustness. Java strings have a maximum size that is easily smaller than many files you might want to read in.

And by the way, your proposed implementation is broken in at least two ways: (1) the available method reports how much of the file can be read without blocking - for large(ish) files, the amount available will not be the entire file; (2) the String constructor you use has undefined results if the contents of the file are not encoded with the default charset of the platform where the program is running (if it was produced on some other platform with a different default, for example).

Anonymous said...

Still waiting for the

catch (IOException e) { /* TODO */}

implementation.

Brian Gilstrap said...

So am I. When are you going to contribute it? :-)

phil varner said...

Very nice. I'll have to remember the getClass trick also.

Why did you decide to pass the BufferedReader to the private iterator rather than refer to it via IterableBufferedReader.this.mine ?

One of the annoying things about the java.util iterators is that their semantics are not precisely documented if you're doing anything other than single-threaded read-only forward iteration. It's a hard problem because there's no obvious correct behavior for some operations. It's interesting to look at the source of Itr and ListItr in AbstractList (written by Gafter and Bloch) and the iterators in the the Google Collections library. Both elegant solutions to the iterator pattern.

I believe the simple solution to the try-catch in advance is

line = null;

There's no difference between there not being a next line and not being able to retrieve it. This way, hasNext returns false, so the loop exits.

Sultan Rehman said...

Maybe I'm missing something here, but is there a reason why the Java enhanced loop isn't designed to support accepting an Iterator directly (in addition to the Iterable)? I mean why wouldn't be nice if you could say:

for ( String line : new LineIterator( ... ) ) { ... }

where LineIterator is just:

public LineIterator implements Iterator< String > { ... }

For something simple like your BufferedReaderIterator, it seems stupid that we need to write an Iterable container/wrapper for the Iterator, which does the real work anyway, just becoz Java's for loop is not smart enough to handle Iterator's directly.

Brian Gilstrap said...

By the way, the completed implementation of advance for the BufferedReaderIterator is:

private void advance() {
    BufferedReader temp = br;
     line = null;
     if (temp != null) {
         try {
             line = temp.readLine();
             if (line == null) close();
         }
         catch (IOException e) { close(); }
     }
}

This goes along with the implementation of close:

private void close() {
     BufferedReader temp = br;
     br = null;
     line = null;
     if (ownReader && temp != null) {
         try { temp.close(); }
         catch (IOException e) { /* oh well */ }
     }
}

phil varner said...

I realized I forgot the close later, and that just setting the line to null wasn't correct. The question is, what are the chances/circumstances that a BR will throw and IOE during readLine but successfully complete close without throwing one?

What is the decision behind using the temp BR in advance and close rather than calling br=null in a finally block?

yonatan wilkof said...

great and helpful post!