Tuesday, July 31, 2007

RemoteIterator, episode 2

Last time, I lamented the lack of a standard RemoteIterator interface in Java. I went on to show a version that showed up not long ago in Jackrabbit (an implementation of JSR 170) and the version I've recreated several times in the past. Let's take a quick look at them again.

Jackrabbit's:

public interface RemoteIterator extends Remote {

long getSize() throws RemoteException;

void skip(long items) throws NoSuchElementException, RemoteException;

Object[] nextObjects() throws IllegalArgumentException, RemoteException;

}

The RemoteIterator interface I've recreated a number of times over the years:

public interface RemoteIterator<T> extends Remote {

public boolean hasMore() throws RemoteException;

public List<T> next( int preferredBatchSize ) throws RemoteException;

}

Let's go through the methods individually:

  • long getSize() throws RemoteException

    This seems problematic to me. Perhaps in Jackrabbit, it is reasonable to assume that you will always know the size of the collection to be iterated. But in many situations, this is not the case. For example, if the iterator is ultimately coming from a database cursor and the collection is large, we want to avoid having to run two SQL queries or store the entire collection in memory (which may be prohibitively expensive). I can see a derived interface, perhaps named BoundedRemoteIterator or something, but not in this interface.

  • void skip(long items) throws NoSuchElementException, RemoteException

    I love this idea. Wish I'd thought of it myself. A trivial implementation would do whatever next( items ), would do and simply not return the items. A smart implementation can save time and space by not actually generating the objects (since they won't be returned). The only change I would make is to have a return value of the number of items skipped and get rid of the NoSuchElementException. If I ask to skip 50 items and only 29 more exist, I would get a return of 29 and no exception. Admittedly, if the interface included a getSize operation, throwing NoSuchElementException is less strange. But even in that case, it seems too rigid to consider skipping more items than are left an 'exception'.

  • boolean hasMore() throws RemoteException

    At first, you might think this is a wasteful operation. Why make a call to find out if there are more items? This results in 'extra' remote calls instead of simply asking for the next item. However, if we imagine the items themselves as quite large (e.g. multi-megapixel images), we start to see a use for this API. If we were keeping the getSize operation, this might not be necessary, but again, getSize() requires state tracking on the client side that can substantially complicate the client code.

  • Object[] nextObjects() throws IllegalArgumentException, RemoteException

    This is similar to my next operation discussed below, except (1) it throws IllegalArgumentException, (2) returns an array of objects instead of a List of a specific type, and (3) does not take a preferred size.

    (1) I'm guessing that the getSize operation is intended to be called and then the caller is expected to know when all items in the iterator are consumed. This creates a problem if the client consumes different parts of the iterator in different parts of the code. Now, they have to pass around 2 or 3 items (remote iterator, number of items consumed so far, and possibly the total size) to do this. This is clunky to say the least.

    (2) Jackrabbit may have been specified before 1.5 was in wide use, or may feel the need for backward compatibility, hence the use of an array of Object. The downside to this is the loss of type-safety and loss of an API that can be optimized. See the discussion of next below.

    (3) See the discussion of next below for the rationale for a preferred batch size.

  • List<T> next( int preferredBatchSize ) throws RemoteException

    By returning a List<T>, we not only introduce type information/safety, we also allow for further optimization. The List returned could be a simple implementation such as LinkedList or ArrayList. Or, it could be a smart List that collaborates with the RemoteIterator to delay retrieving its contents or retrieve the contents in separate threads. There are lots of possibilities introduced by avoiding raw Java arrays.

    Allowing the client to pass a preferred batch size provides many optimization opportunities. For example, if displaying the items in a GUI, they can ask for 'page-sized' numbers of items. By making the preferred size a suggestion, it allows an implementation leeway to ignore requests that are deemed unmanageably small or large for the kind of information being returned.


So, where does this leave us? Our RemoteIterator now looks like this:

import java.rmi.Remote;
import java.rmi.RemoteException;
import java.util.List;

/**
* Interface for a remote iterator that returns entries of a particular type.
* @see java.util.Iterator
* @see java.rmi.Remote
*/
public interface RemoteIterator<T> extends Remote {

/**
* Determine if there are more items to be iterated across
* @return true if there are more items to be
* iterated, false otherwise.
* @throws RemoteException if a problem occurs with
* communication or on the remote server.
*/
boolean hasMore() throws RemoteException;

/**
* Skip some number of items.
* @param items the number of items to skip at maximum. If there are fewer
* items than this left, all remaining items are skipped.
* @return the number of itmes actually skipped. This will equal <code>items</code>
* unless there were not that many items left in the iteration.
* @throws RemoteException if a problem occurs with
* communication or on the remote server.
*/
int skip(long items) throws RemoteException;

/**
* Get some number of items
* @param preferredBatchSize a suggested number of items to return. The implementation
* is not required to honor this request if it will prove too difficult.
* @return a List of the next items in the iteration, in iteration order.
* @throws RemoteException if a problem occurs with
* communication or on the remote server.
*/
List<T> next( int preferredBatchSize ) throws RemoteException;

}


I'm still wondering why this isn't part of the JDK.

Thursday, July 19, 2007

RemoteIterator, where art thou?

I've written my fair share of remote iterators over the years. The concept is simple enough: provide an interface for iterating over a set of items that come from a remote source. The harder part is making a good abstraction that can still perform well.

Even so, it surprises me that, as far as I can tell, there hasn't been a standardized form of this. There's been discussion for a long time, with my earliest memories stemming from the JINI mailing lists back in 2001. Yet there's nothing in the core as of Java 6. There are random postings on various groups on the subject through the years, but no actual API that I can find.

Recently, I was teaching my course in Java performance tuning and discussing remote iterators, when a student mentioned that Jackrabbit (an implementation of JSR 170) has an RMI remoting API called Jackrabbit JCR-RMI which includes a RemoteIterator. As of version 1.0.1, it looks like this:

public interface RemoteIterator extends Remote {

long getSize() throws RemoteException;

void skip(long items) throws NoSuchElementException, RemoteException;

Object[] nextObjects() throws IllegalArgumentException, RemoteException;

}

Here is the RemoteIterator interface I've recreated a number of times over the years:

public interface RemoteIterator extends Remote {

public boolean hasMore() throws RemoteException;

public List next( int preferredBatchSize ) throws RemoteException;

}

These two APIs are strikingly similar. They both extend Remote, they both provide a means to get some number of the next items to be retrieved (mine with generics and theirs without), and they are both quite small. Yet they are also strikingly different: JackRabbit's API has both getSize and skip while mine doesn't.

Next time, I'll describe what I see as the major differences and what I like about each. I'll also compare the client-side front-end for remote iterators from Jackrabbit and the one I've created. In the mean time, consider it a challenge to comment about everything I was going to bring up and more.

Monday, July 02, 2007

Random Acts of Blog

I've been subscribed to Brian Coyner's blog for some time. I had this URL:

feed://beanman.wordpress.com/atom.xml

Recently, I was quite surprised when Brian seemed to stop blogging about software and began blogging about bands I had no idea he liked. But I chalked it up to a sudden burst of music interest.

Then, today, I tried to go to his blog and got this:



It turns out to be a blog feed for People Magazine's StyleWatch. WTF? A bit of digging and it would seem that the feed above no longer works. But instead of giving me an error, I'm directed to a random wordpress.com blog page instead. Talk about non-intuitive!

Freaky....