Thursday, July 19, 2007

RemoteIterator, where art thou?

I've written my fair share of remote iterators over the years. The concept is simple enough: provide an interface for iterating over a set of items that come from a remote source. The harder part is making a good abstraction that can still perform well.

Even so, it surprises me that, as far as I can tell, there hasn't been a standardized form of this. There's been discussion for a long time, with my earliest memories stemming from the JINI mailing lists back in 2001. Yet there's nothing in the core as of Java 6. There are random postings on various groups on the subject through the years, but no actual API that I can find.

Recently, I was teaching my course in Java performance tuning and discussing remote iterators, when a student mentioned that Jackrabbit (an implementation of JSR 170) has an RMI remoting API called Jackrabbit JCR-RMI which includes a RemoteIterator. As of version 1.0.1, it looks like this:

public interface RemoteIterator extends Remote {

long getSize() throws RemoteException;

void skip(long items) throws NoSuchElementException, RemoteException;

Object[] nextObjects() throws IllegalArgumentException, RemoteException;


Here is the RemoteIterator interface I've recreated a number of times over the years:

public interface RemoteIterator extends Remote {

public boolean hasMore() throws RemoteException;

public List next( int preferredBatchSize ) throws RemoteException;


These two APIs are strikingly similar. They both extend Remote, they both provide a means to get some number of the next items to be retrieved (mine with generics and theirs without), and they are both quite small. Yet they are also strikingly different: JackRabbit's API has both getSize and skip while mine doesn't.

Next time, I'll describe what I see as the major differences and what I like about each. I'll also compare the client-side front-end for remote iterators from Jackrabbit and the one I've created. In the mean time, consider it a challenge to comment about everything I was going to bring up and more.


ij said...

I've found that in a distributed system it may be useful to implement a Command (-like) pattern. For example, instead of iterating over the elements of a collection, I'd 'ask' the collection to perform something on its elements. With naive implementation, even in the worst case when the peer needs to call back on each iteration, it is just one call instead of two. Granted, your example is more bandwidth-conscious than a naive implementation since it provides for batching the remote calls, but you get the idea ...

Brian Gilstrap said...

IJ, I agree that some sort of command(-like) pattern on the server side is a good alternative to remote iterators in many situations. But sometimes it is inappropriate to push the work back onto the server. This can be due to any of a number of factors:

* The server is too heavily loaded to add more to its work load
* The kind of work being done is specific to the client application, and not really appropriate for the server side
* The information retrieved from the service will be combined with information from other services, perhaps in a complicated fashion (filtering results based upon queries to other services, for example)
* Many different clients would like the work done just differently enough to make implementation on the server-side too complicated
* etc.