View from the Fringe

Tuesday, September 18, 2007

Eric Burke suppressing a desire to get a Mac?

My friend and colleague, Eric Burke, recently altered his blog to show the date of the posting with a calendar 'icon' rendered by a CSS style-sheet. While I appreciate his frustration with the inconsistent handling of CSS by different browsers, that is not what caught my eye. Here is his graphic:

And here is the icon for iCal (the calendar program on Mac OS X):

What do you think? Is he is suppressing a desire to get a Mac?

Wednesday, August 08, 2007

Gilstrap Estimation Curves

I am constantly producing estimates. How long will this project take? How long will that feature take? How long will it take us to test things? And my favorite: How long will it take to fix the bugs we haven't found?

Other than the last, nonsense question, these are legitimate issues to address in the process of software development. Unfortunately, our industry (and some others) tends to be very simplistic in their estimates.

There are many ways to go wrong in estimating. Perhaps the the most common is attempting to pretend that people are 'resources' and that everyone is interchangeable. But that's a topic for another day. Today, I want to look at a way of estimating that is simple enough to use yet isn't one dimensional.

One of the biggest problems with estimating is trying to incorporate some notion of risk in the estimate. Is your estimate iron-clad, where you are 99.99999% sure you'll meet the deadline (and consequently there's a chance you can beat it), or will you be working flat-out trying to meet the date with only a limited chance of making it? This is one of the most common problems in projects: one party gives an aggressive estimate and another party assumes it's conservative. We need a better way to communicate the likelihood of completing a task in a given amount of time.

To capture this idea, we need to represent the work with more than a single number, and we need to incorporate risk. Happily, this lends itself to a two-dimensional graph, with effort (time) as the x-axis and likelihood of completion (what I'll call "confidence") as the y-axis, like this:

The line represents your confidence, at the time you make the estimate, that you will solve the problem with the effort specified. Depending on the scope of the work, the effort line might be measured in hours, days, weeks, months, or even years. The basic idea is that it will take some period of time to understand the unknowns of a problem, which is the early, low-confidence part of the curve. Once the basic problem is understood work proceeds apace, and confidence goes up rapidly. The tail of the curve represents the fact that things might go well and we could get done earlier than expected. Let's take an example.

Assume you are asked to implement a new kind of cache for your system. It involves a "least recently used" (LRU) strategy along with a maximum age for cache entries. Instead of saying "it'll take 10 man-days of effort", as if it can not take only 9 or would never take 11, you can graph it, like this:

This says that you think there's about a 2% chance of completing the work in 1 man-day, a 20% chance in four man-days, a 90% chance you'll get done in 8 days, etc. You are not saying you'll be done in a specific number of man-days. Instead, you provide an estimate of the likelihood you'll complete the work in a particular number of man-days. This is useful because it helps provide some insight into the overall risk you see in the work.

This is not to say that all problems follow this curve. Some may in fact be so difficult that they never approach 100%. For example, if you thought there was only a 50% chance you would actually be able to come up with a solution to a problem, you might have a graph like this:

Similarly, if you think the problem is well-understood, your graph might look like this:

At first this might seem odd. But if you really understand the problem you'll be able to estimate the required effort very accurately. This means there's very little chance of getting done faster than your estimate, but it also means there's very little chance you'll run long.

Perhaps this is a well-known thing in project management circles, but I've never seen anything like it in over twenty years in industry. I plan to use these estimation curves to better communicate my estimates to others.

Tuesday, July 31, 2007

RemoteIterator, episode 2

Last time, I lamented the lack of a standard RemoteIterator interface in Java. I went on to show a version that showed up not long ago in Jackrabbit (an implementation of JSR 170) and the version I've recreated several times in the past. Let's take a quick look at them again.

Jackrabbit's:


public interface RemoteIterator extends Remote {

    long getSize() throws RemoteException;

    void skip(long items) throws NoSuchElementException, RemoteException;

    Object[] nextObjects() throws IllegalArgumentException, RemoteException;

}

The RemoteIterator interface I've recreated a number of times over the years:


public interface RemoteIterator<T> extends Remote {

    public boolean hasMore() throws RemoteException;

    public List<T> next( int preferredBatchSize ) throws RemoteException;

}

Let's go through the methods individually:

long getSize() throws RemoteException

This seems problematic to me. Perhaps in Jackrabbit, it is reasonable to assume that you will always know the size of the collection to be iterated. But in many situations, this is not the case. For example, if the iterator is ultimately coming from a database cursor and the collection is large, we want to avoid having to run two SQL queries or store the entire collection in memory (which may be prohibitively expensive). I can see a derived interface, perhaps named BoundedRemoteIterator or something, but not in this interface.

void skip(long items) throws NoSuchElementException, RemoteException

I love this idea. Wish I'd thought of it myself. A trivial implementation would do whatever next( items ), would do and simply not return the items. A smart implementation can save time and space by not actually generating the objects (since they won't be returned). The only change I would make is to have a return value of the number of items skipped and get rid of the NoSuchElementException. If I ask to skip 50 items and only 29 more exist, I would get a return of 29 and no exception. Admittedly, if the interface included a getSize operation, throwing NoSuchElementException is less strange. But even in that case, it seems too rigid to consider skipping more items than are left an 'exception'.

boolean hasMore() throws RemoteException

At first, you might think this is a wasteful operation. Why make a call to find out if there are more items? This results in 'extra' remote calls instead of simply asking for the next item. However, if we imagine the items themselves as quite large (e.g. multi-megapixel images), we start to see a use for this API. If we were keeping the getSize operation, this might not be necessary, but again, getSize() requires state tracking on the client side that can substantially complicate the client code.

Object[] nextObjects() throws IllegalArgumentException, RemoteException

This is similar to my next operation discussed below, except (1) it throws IllegalArgumentException, (2) returns an array of objects instead of a List of a specific type, and (3) does not take a preferred size.

(1) I'm guessing that the getSize operation is intended to be called and then the caller is expected to know when all items in the iterator are consumed. This creates a problem if the client consumes different parts of the iterator in different parts of the code. Now, they have to pass around 2 or 3 items (remote iterator, number of items consumed so far, and possibly the total size) to do this. This is clunky to say the least.

(2) Jackrabbit may have been specified before 1.5 was in wide use, or may feel the need for backward compatibility, hence the use of an array of Object. The downside to this is the loss of type-safety and loss of an API that can be optimized. See the discussion of next below.

(3) See the discussion of next below for the rationale for a preferred batch size.

List<T> next( int preferredBatchSize ) throws RemoteException

By returning a List<T>, we not only introduce type information/safety, we also allow for further optimization. The List returned could be a simple implementation such as LinkedList or ArrayList. Or, it could be a smart List that collaborates with the RemoteIterator to delay retrieving its contents or retrieve the contents in separate threads. There are lots of possibilities introduced by avoiding raw Java arrays.

Allowing the client to pass a preferred batch size provides many optimization opportunities. For example, if displaying the items in a GUI, they can ask for 'page-sized' numbers of items. By making the preferred size a suggestion, it allows an implementation leeway to ignore requests that are deemed unmanageably small or large for the kind of information being returned.

So, where does this leave us? Our RemoteIterator now looks like this:


import java.rmi.Remote;
import java.rmi.RemoteException;
import java.util.List;

/**
 * Interface for a remote iterator that returns entries of a particular type.
 * @see java.util.Iterator
 * @see java.rmi.Remote
 */
public interface RemoteIterator<T> extends Remote {

    /**
     * Determine if there are more items to be iterated across
     * @return true if there are more items to be
     * iterated, false otherwise.
     * @throws RemoteException if a problem occurs with
     * communication or on the remote server.
     */
    boolean hasMore() throws RemoteException;

    /**
     * Skip some number of items.
     * @param items the number of items to skip at maximum. If there are fewer
     * items than this left, all remaining items are skipped.
     * @return the number of itmes actually skipped. This will equal <code>items</code>
     * unless there were not that many items left in the iteration.
     * @throws RemoteException if a problem occurs with
     * communication or on the remote server.
     */
    int skip(long items) throws RemoteException;

    /**
     * Get some number of items
     * @param preferredBatchSize a suggested number of items to return. The implementation
     * is not required to honor this request if it will prove too difficult.
     * @return a List of the next items in the iteration, in iteration order.
     * @throws RemoteException if a problem occurs with
     * communication or on the remote server.
     */
    List<T> next( int preferredBatchSize ) throws RemoteException;

}

I'm still wondering why this isn't part of the JDK.

Thursday, July 19, 2007

RemoteIterator, where art thou?

I've written my fair share of remote iterators over the years. The concept is simple enough: provide an interface for iterating over a set of items that come from a remote source. The harder part is making a good abstraction that can still perform well.

Even so, it surprises me that, as far as I can tell, there hasn't been a standardized form of this. There's been discussion for a long time, with my earliest memories stemming from the JINI mailing lists back in 2001. Yet there's nothing in the core as of Java 6. There are random postings on various groups on the subject through the years, but no actual API that I can find.

Recently, I was teaching my course in Java performance tuning and discussing remote iterators, when a student mentioned that Jackrabbit (an implementation of JSR 170) has an RMI remoting API called Jackrabbit JCR-RMI which includes a RemoteIterator. As of version 1.0.1, it looks like this:


public interface RemoteIterator extends Remote {

    long getSize() throws RemoteException;

    void skip(long items) throws NoSuchElementException, RemoteException;

    Object[] nextObjects() throws IllegalArgumentException, RemoteException;

}

Here is the RemoteIterator interface I've recreated a number of times over the years:


public interface RemoteIterator extends Remote {

    public boolean hasMore() throws RemoteException;

    public List next( int preferredBatchSize ) throws RemoteException;

}

These two APIs are strikingly similar. They both extend Remote, they both provide a means to get some number of the next items to be retrieved (mine with generics and theirs without), and they are both quite small. Yet they are also strikingly different: JackRabbit's API has both getSize and skip while mine doesn't.

Next time, I'll describe what I see as the major differences and what I like about each. I'll also compare the client-side front-end for remote iterators from Jackrabbit and the one I've created. In the mean time, consider it a challenge to comment about everything I was going to bring up and more.

Monday, July 02, 2007

Random Acts of Blog

I've been subscribed to Brian Coyner's blog for some time. I had this URL:

feed://beanman.wordpress.com/atom.xml

Recently, I was quite surprised when Brian seemed to stop blogging about software and began blogging about bands I had no idea he liked. But I chalked it up to a sudden burst of music interest.

Then, today, I tried to go to his blog and got this:

It turns out to be a blog feed for People Magazine's StyleWatch. WTF? A bit of digging and it would seem that the feed above no longer works. But instead of giving me an error, I'm directed to a random wordpress.com blog page instead. Talk about non-intuitive!

Freaky....

Thursday, June 28, 2007

Friday Java Quiz

My friend, Weiqi Gao, frequently posts a Friday Java Quiz . I ran into a situation today that seems like a good one.

Suppose you have the following two classes:


public class Test {
    public static void main(String[] args) {
        System.err.println( Foo.class.getName() );
        System.err.println( "Testing, 1, 2, 3..." );
        new Foo();
    }
}


public class Foo {
    static {
        System.err.println( "Foo here." );
    }
    public Foo() {
        System.err.println( "New Foo!" );
    }
}

Without running this program, do you know what the output will be?

Thursday, May 17, 2007

Kyle Cordes on tools

Kyle Cordes wants people to view a presentation by Linus Torvalds discussing distributed version control tools. I plan to watch the talk soon, but a comment Kyle made in his blog entry really rang true. It's something I've been saying (less eloquently) for a long time:

"I have heard it said that these are all 'just tools' which don’t matter, you simply use whatever the local management felt like buying. That is wrong: making better tool choices will make your project better (cheaper, faster, more fun, etc.), making worse tool choices will make your project worse (more expensive, slower, painful, higher turnover, etc.)"

Friday, April 06, 2007

I've been tagged

There is apparently a blog-tag game going around, and Jeff Brown tagged me.

I practice Yang-style short form T'ai Chi Ch'uan.

I've been having fun lately making home-made no knead bread.

I enjoy woodworking. I'm currently tuning my latest large tool purchase. It took me about a year to save enough to get my bandsaw. I even build things.

I've used a Mac since the original 128k version introduced in 1984.

Perhaps because I didn't drink much as a teenager, I never developed a taste for beer. I do enjoy single malt scotch.

Now I'm tagging Weiqi, Brad, Dave, Brian, and Melben.

Tuesday, March 20, 2007

Too much C++

(with apologies to Monty Python)

A customer enters a software shop.

Mr. Praline: 'Ello, I wish to register a complaint.

The owner does not respond.

Mr. Praline: 'Ello, Miss?

Owner: What do you mean "miss"?

Mr. Praline: I'm sorry, I have a cold. I wish to make a complaint!

Owner: We're closin' for lunch.

Mr. Praline: Never mind that, my lad. I wish to complain about this programming language what I purchased not half an hour ago from this very boutique.

Owner: Oh yes, the, uh, the Danish C++...What's,uh...What's wrong with it?

Mr. Praline: I'll tell you what's wrong with it, my lad. 'E's crap, that's what's wrong with it!

Owner: No, no, 'e's uh,...it's resting.

Mr. Praline: Look, matey, I know a crap language when I see one, and I'm looking at one right now.

Owner: No no it's not crap, it's, it's restin'! Remarkable language, the Danish C++, idn'it, ay? Beautiful syntax!

Mr. Praline: The syntax don't enter into it. It's stone crap.

Owner: Nononono, no, no! 'E's resting!

Mr. Praline: All right then, if it's restin', I'll wake him up! (shouting at the cage) 'Ello, Mister Plymorphic Language! I've got a lovely fresh computer for you if you show...

owner hits the cage

Owner: There, he moved!

Mr. Praline: No, he didn't, that was you hitting the cage!

Owner: I never!!

Mr. Praline: Yes, you did!

Owner: I never, never did anything...

Mr. Praline: (yelling and hitting the cage repeatedly) 'ELLO POLYMORPH!!!!! Testing! Testing! Testing! Testing! This is your nine o'clock alarm call!

Takes language out of the cage and thumps its head on the counter. Throws it up in the air and watches it plummet to the floor.

Mr. Praline: Now that's what I call a crap language.

Owner: No, no.....No, 'e's stunned!

Mr. Praline: STUNNED?!?

Owner: Yeah! You stunned him, just as he was wakin' up! Danish C++'s stun easily, major.

Mr. Praline: Um...now look...now look, mate, I've definitely 'ad enough of this. That language is definitely deceased, and when I purchased it not 'alf an hour ago, you assured me that its total lack of movement was due to it bein' tired and shagged out following a prolonged standards meeting.

Owner: Well, it's...it's, ah...probably pining for a common runtime.

Mr. Praline: PININ' for a COMMON RUNTIME?!?!?!? What kind of talk is that?, look, why did he fall flat on his back the moment I got 'im home?

Owner: The Danish C++ prefers keepin' on it's back! Remarkable language, id'nit, squire? Lovely syntax!

Mr. Praline: Look, I took the liberty of examining that language when I got it home, and I discovered the only reason that it had been sitting on its perch in the first place was that it had been NAILED there.

pause

Owner: Well, o'course it was nailed there! If I hadn't nailed that language down, it would have nuzzled up to those computers, broke 'em apart with its beak, and VOOM! Feeweeweewee!

Mr. Praline: "VOOM"?!? Mate, this language wouldn't "voom" if you put four million volts through it! 'E's bleedin' demised!

Owner: No no! 'E's pining!

Mr. Praline: 'E's not pinin'! 'E's passed on! This language is no more! He has ceased to be! 'E's expired and gone to meet 'is maker! 'E's a stiff! Bereft of life, 'e rests in peace! If you hadn't nailed 'im to the perch 'e'd be pushing up the daisies! 'Is processes are now 'istory! 'E's off the map! 'E's kicked the bucket, 'e's shuffled off 'is mortal coil, run down the curtain and joined the bleedin' choir invisibile!! THIS IS AN EX-LANGUAGE!!

pause

Owner: Well, I'd better replace it, then. (he takes a quick peek behind the counter) Sorry squire, I've had a look 'round the back of the shop, and uh, we're right out of languages.

Mr. Praline: I see. I see, I get the picture.

Owner: I got Java.

pause

Mr. Praline: Pray, does it support pointers?

Owner: Nnnnot really.

Mr. Praline: WELL IT'S HARDLY A BLOODY REPLACEMENT, IS IT?!!???!!?

Owner: N-no, I guess not. (gets ashamed, looks at his feet)

Mr. Praline: Well.

pause

Owner: (quietly) D'you.... d'you want to come back to my place an' dereference some pointers?

Mr. Praline: (looks around) Yeah, all right, sure.

Thursday, March 15, 2007

The ever-leaky abstractions of Microsoft

I was using Windows XP inside Parallels today and got a message from Windows that I had never seen before:

How on earth could you design an operating system that could detect there wasn't enough virtual memory, expand the virtual memory pool automatically, but would deny applications requests for virtual memory at the same time?