Tuesday, July 31, 2007

RemoteIterator, episode 2

Last time, I lamented the lack of a standard RemoteIterator interface in Java. I went on to show a version that showed up not long ago in Jackrabbit (an implementation of JSR 170) and the version I've recreated several times in the past. Let's take a quick look at them again.

Jackrabbit's:

public interface RemoteIterator extends Remote {

long getSize() throws RemoteException;

void skip(long items) throws NoSuchElementException, RemoteException;

Object[] nextObjects() throws IllegalArgumentException, RemoteException;

}

The RemoteIterator interface I've recreated a number of times over the years:

public interface RemoteIterator<T> extends Remote {

public boolean hasMore() throws RemoteException;

public List<T> next( int preferredBatchSize ) throws RemoteException;

}

Let's go through the methods individually:

  • long getSize() throws RemoteException

    This seems problematic to me. Perhaps in Jackrabbit, it is reasonable to assume that you will always know the size of the collection to be iterated. But in many situations, this is not the case. For example, if the iterator is ultimately coming from a database cursor and the collection is large, we want to avoid having to run two SQL queries or store the entire collection in memory (which may be prohibitively expensive). I can see a derived interface, perhaps named BoundedRemoteIterator or something, but not in this interface.

  • void skip(long items) throws NoSuchElementException, RemoteException

    I love this idea. Wish I'd thought of it myself. A trivial implementation would do whatever next( items ), would do and simply not return the items. A smart implementation can save time and space by not actually generating the objects (since they won't be returned). The only change I would make is to have a return value of the number of items skipped and get rid of the NoSuchElementException. If I ask to skip 50 items and only 29 more exist, I would get a return of 29 and no exception. Admittedly, if the interface included a getSize operation, throwing NoSuchElementException is less strange. But even in that case, it seems too rigid to consider skipping more items than are left an 'exception'.

  • boolean hasMore() throws RemoteException

    At first, you might think this is a wasteful operation. Why make a call to find out if there are more items? This results in 'extra' remote calls instead of simply asking for the next item. However, if we imagine the items themselves as quite large (e.g. multi-megapixel images), we start to see a use for this API. If we were keeping the getSize operation, this might not be necessary, but again, getSize() requires state tracking on the client side that can substantially complicate the client code.

  • Object[] nextObjects() throws IllegalArgumentException, RemoteException

    This is similar to my next operation discussed below, except (1) it throws IllegalArgumentException, (2) returns an array of objects instead of a List of a specific type, and (3) does not take a preferred size.

    (1) I'm guessing that the getSize operation is intended to be called and then the caller is expected to know when all items in the iterator are consumed. This creates a problem if the client consumes different parts of the iterator in different parts of the code. Now, they have to pass around 2 or 3 items (remote iterator, number of items consumed so far, and possibly the total size) to do this. This is clunky to say the least.

    (2) Jackrabbit may have been specified before 1.5 was in wide use, or may feel the need for backward compatibility, hence the use of an array of Object. The downside to this is the loss of type-safety and loss of an API that can be optimized. See the discussion of next below.

    (3) See the discussion of next below for the rationale for a preferred batch size.

  • List<T> next( int preferredBatchSize ) throws RemoteException

    By returning a List<T>, we not only introduce type information/safety, we also allow for further optimization. The List returned could be a simple implementation such as LinkedList or ArrayList. Or, it could be a smart List that collaborates with the RemoteIterator to delay retrieving its contents or retrieve the contents in separate threads. There are lots of possibilities introduced by avoiding raw Java arrays.

    Allowing the client to pass a preferred batch size provides many optimization opportunities. For example, if displaying the items in a GUI, they can ask for 'page-sized' numbers of items. By making the preferred size a suggestion, it allows an implementation leeway to ignore requests that are deemed unmanageably small or large for the kind of information being returned.


So, where does this leave us? Our RemoteIterator now looks like this:

import java.rmi.Remote;
import java.rmi.RemoteException;
import java.util.List;

/**
* Interface for a remote iterator that returns entries of a particular type.
* @see java.util.Iterator
* @see java.rmi.Remote
*/
public interface RemoteIterator<T> extends Remote {

/**
* Determine if there are more items to be iterated across
* @return true if there are more items to be
* iterated, false otherwise.
* @throws RemoteException if a problem occurs with
* communication or on the remote server.
*/
boolean hasMore() throws RemoteException;

/**
* Skip some number of items.
* @param items the number of items to skip at maximum. If there are fewer
* items than this left, all remaining items are skipped.
* @return the number of itmes actually skipped. This will equal <code>items</code>
* unless there were not that many items left in the iteration.
* @throws RemoteException if a problem occurs with
* communication or on the remote server.
*/
int skip(long items) throws RemoteException;

/**
* Get some number of items
* @param preferredBatchSize a suggested number of items to return. The implementation
* is not required to honor this request if it will prove too difficult.
* @return a List of the next items in the iteration, in iteration order.
* @throws RemoteException if a problem occurs with
* communication or on the remote server.
*/
List<T> next( int preferredBatchSize ) throws RemoteException;

}


I'm still wondering why this isn't part of the JDK.

2 comments:

Laird Nelson said...

When do you find use for this abstraction? I, too, built a remote iterator at one point (I like yours better--the skip() is indeed a good addition), thinking that I'd use it for streaming data from the server to the client--but then I realized that pretty much any real-world implementation would be fetching chunks of data from the server synchronously anyhow (the paging pattern). That is, whether you're talking about JEE or Hibernate or some such, you're almost always talking about something on the server side grabbing a few (or a hundred, or a thousand) objects from the database/legacy system/muck and returning that little globule back to the caller, while the caller hangs out and waits. I am curious to know the situations in which you've found the need for a remote iterator.

Brian Gilstrap said...

I think that returning things in pages is a perfect example. If the 'page' of items is some number of items (e.g. rows in a list/table) then the client can request the proper number of items and retrieve only those. For example, if a 'page' holds 36 items, then the client and ask for the next 36 items. They might get more or less, but it takes very little tuning to make this work nicely.

However, this does run into issues of state management if you're talking about stateless J2EE apps (or stateless server-side apps in general). If you are a purist who wants zero state, this pattern won't work very well for you. If you have a richer client communicating directly via the RemoteIterator, or a server-side environment where you don't have to be purely stateless, it works well.