Monday, May 05, 2008

Groovy Type Handling is Broken

Here's a very simple Java program snippet:

List<Integer> list = new ArrayList<Integer>();
Collection<Integer> col = list;
list.add( 2 );
list.add( 1 );
list.add( 0 );
System.out.println( list );
col.remove( 0 );

Run this and you'll get what you'd expect. Before using the col reference to remove the integer/Integer with value 0 (zero), you see:

[2, 1, 0]

Then, after calling remove on the collection, we see the integer/Integer with value zero is gone (that's the one we said to remove):

[2, 1]

Groovy says that it allows you to use types and will honor them at runtime. But it doesn't.

Take the same snippet of code and run it as a Groovy script. Remember that the Collection interface has a remove method, and that method expects you to pass it an object which it will remove from the collection. So we would expect to see the same result as in Java. But we don't. Instead, we get this:

[2, 1, 0]
[1, 0]

Groovy has removed the Integer with a value of 2! It did this because of Groovy's 'multi-methods'. Groovy looked at the real type of col, which is an ArrayList. It then found a remove method that takes an integer/Integer. That method removes the Nth item from the list. It used that method and removed the zero-th item (the one with a value of 2).

But remember, I invoked the remove. method through the Collection interface. The collection interface does not have a method taking an integer/Integer but it does have a remove method taking the object to be removed!

This is a truly awful result. It removes our ability to reason about how code will behave by looking at the types involved. In order to know how this will behave, I cannot make any use of the type information; in fact the type information is downright misleading. Even worse, as the type system evolves and new classes are introduced, new methods with entirely different signatures may get invoked.

This approach to handling types is essentially broken. The claim that Groovy honors types (even the weakened form of honoring it by doing casts at runtime) is flat out wrong.

Groovy needs to either start honoring type information, or get rid of it.


Khalil said...

Calling overloaded methods with the same arity from dynamic languages is pretty tricky, javac goes into some quite convuleted logic to find the right method. Now there is nothing wrong with wishing ill fate on Groovy with such a name they do deserve it, I myself have never downloaded it, but your post is a tad melodramatic, and well this things when they happen have a clear diagnostic it is called a bug, you could have simply filed a bug report.

Weiqi Gao said...

Groovy method dispatch is always done according to the dynamic type of the receiver and parameters.

My understanding is that the optional typing information is only used limitedly (to prevent assignment of a non-Foo instance to a Foo typed variable, for example) and then discarded. The rest of the computation is done as if all variables are declared with def.

In particular, Groovy does not support static typing.

Here's a mailing list thread that explains the situation:


Brian Gilstrap said...

I can't post a bug report, Khalil, because this is documented as a feature. It's not a feature, and it means that declaring types in Groovy is fundamentally misleading. Hence, my statement that Groovy should just stop providing the tiny bit of 'type support' that it has because it's a net loss.

Brian Gilstrap said...

I read the thread, Weiqi, but didn't see any explanation that was remotely satisfying. What's the point of guaranteeing that something matches a particular type (interface or class) if you subsequently don't used that type to determine how to interact with the object? There's a very tiny value, perhaps, in that you know the object matches the type in question and is likely to behave in a manner consistent with that type.

But my example demonstrates how Groovy then goes on to do the wrong thing from a strong typing perspective. It does not invoke the remove method declared in the type of the variable. It invokes an entirely different remove method with different semantics.

One of the hardest things to do in software is to ensure that changes or new code introduced into a working system do not break the system. Relying on unit testing is not nearly enough. Especially when you consider the other dynamic code features of Groovy (e.g. the ability to synthesize code from inputs 'on the fly'), the multi-method behavior of Groovy strips away almost all of a developer's ability to reason about the effects of changes to an existing system. This leaves them dependent upon the very dubious assumption that they as the developer of the system will predict and write tests for all the usage cases users will throw at the software. The more popular the software, the less likely this is. The more complex the software, the less likely this is. The larger the software, the less likely this is. And we haven't even begun to discuss what happens when the development team has staff turnover and we get a developer who didn't write the system maintaining it. Yikes.

Having the type information there is incredibly misleading for anyone who has used a typed language before. It really shouldn't be there.

Note that I'm not saying you can't build working systems with Groovy. But the type syntax is a hazard, not a benefit. And the method dispatch paradigm shrinks the code size and team size for building reliable systems. Given Groovy's current semantics, I'd estimate Groovy becomes a liability when the team size exceeds 5 people (fewer if they aren't all top notch developers) and when the code size exceeds 250k (developer-written) lines of code. And to remain productive they need to not use any type information in the code.

Mario said...

Perhaps you should try JRuby ;-)

The code example in Groovy
The equivalent code example in JRuby

Michael Easter said...

I believe that this works:

col.remove( 0 as Object )

Sometimes Groovy needs some additional type info. I don't claim that this addresses your concern per se, or that it can match Java's strong typing.

I think it comes down to a semantic issue on what constitutes 'honouring' of Java types. Versus, say, 'support' for static types.

Peter Becker said...

I'm not sure that I put the blame on Groovy here. I'd partially blame the JDK for bad overloading (same method name but different semantics) and definitely Java 5 for the stupid idea of autoboxing -- I really wish they wouldn't have introduced that feature and I normally have Eclipse set up to warn me whenever it happens so I can avoid it.

Brian Gilstrap said...

I see overloading a bit different than you do, Peter. The argument types are part of the method signature. However, that's not the fundamental issue.

The real issue is that Groovy picked a method not in the interface I was using. It groped around the implementation of the object and found a 'more specific' method than the one present in the interface of the type it was claiming to support. It didn't pick the 'wrong' method in the Collection interface, it picked a method not in the Collection interface.

This is simply awful. No longer can a developer look at the API of an interface or class and know which method they are invoking. This weakens Groovy's support for types so much that it becomes almost completely useless. This is not strong typing as the term is commonly understood. Type conformity might be an okay description of what Groovy supports, but not strong typing.

Peter Becker said...

Brian, I agree that the parameter types are part of the signature and that it is technically valid to overload that way. My point is that it is usually bad style to overload if the difference of the method's semantics is much bigger than the difference in the type semantics. And I'd claim that the difference between int and Object is far too small to allow overloading at all (from a style point of view) since an int should really be an Object and is not only due to short-sightedness of Java's design ;-)

Groovy's approach is in most cases exactly what Java does: the actual method called is determined at runtime. Java does it all the time: if you call a method that is overridden in a subtype, then it won't matter if you call it through a reference to a supertype. Calling remove(Integer.valueOf(0)) in your example will in both languages call the remove(Object) method of the ArrayList class. Of course there you would expect that the semantics of Collection.remove(Object) are obeyed, although even that is not guaranteed in the JDK (there are many cases where equals() is overridden wrong, and some Map versions ignore more things).

What you are trying to do is to resolve the ambiguity of "0" that is introduced by autoboxing by using a different reference type. It seems the Java compiler looks at the reference type to determine the method signature, so it gets the result you want. Of course the object type stays the same, and since in dynamic languages the actual object definitions can change, Groovy can not do that, it has to rely on the object type. There is a perfect match there, so it picks that.

Of course that is not what you wanted. But the real problem is that the "0" became ambigious when they introduce autoboxing. What if you put a Long 0 and an Integer 0 into a list and remove "0"? I am not sure myself -- I suspect you always hit the Integer and never the Long (even if there is no Integer). It's confusing, thus bad.

I would also suspect that your way of resolving the ambiguity works only by coincidence. It's probably in the JLS, but I don't think it is designed for that problem. My advice is to stay away from autoboxing. The ambiguity it causes can bite you a lot, even if you do only Java. The only place I use it is testcode, where I need to create lots of sample data. That's the only case where the convenience outweights the risks.

Brian Gilstrap said...

Peter, I'm afraid you still aren't getting the point. Autoboxing and unboxing has nothing to do with this. Groovy picked a method not in the interface of the object I was invoking. I invoked the remove operation on col, a variable typed to Collection. There is only one remove method in that interface, which takes a reference to the object to be removed. Groovy completely ignored the type and invoked an entirely different method not even in the Collection interface.

Peter Becker said...

Autoboxing and unboxing is relevant since without it the original ambiguity (do you call remove(Object) or remove(int)) would not exist in the first place.

The method Groovy picked is on the interface of the object, since the object type is ArrayList. The reference you used to call the method on is of type Collection, but that is a whole different story. The reference type is what the compiler sees, the object type is what the runtime sees.

Normally both Java and Groovy dispatch based on the object type, i.e. the runtime type information. That's necessary to make every method polymorphic (in C++ speak every method is virtual): only at runtime it is possible to determine which implementation in a hierarchy should be called. Both languages will behave exactly the same.

The problem in your example is that the actual method signature to use is ambiguous. Is remove(Object) or remove(int) to be called. That resolution happens in Java at compile time, and since the compiler looks at the reference type it can't find the remove(int) one, so it autoboxes (it would throw an error without autoboxing).

Groovy is a dynamically typed language, which means it does not do these things at compile time, but at runtime. At runtime the reference type is irrelevant, in fact I'd claim that it should be allowed for the compiler to remove the indirection through the Collection interface completely. Since the reference type has no relevance, the object type is used for the dispatch, and the most specific match is then remove(int). It's not what you expected, but it is not only consistent with being a dynamnically typed language, but also the only possible approach if you want to be able to change classes at runtime.

You might find similar issues if you have methods that are distinguishable only by types in a hierachy, e.g. the supertype declares method(Number) and the subtype adds method(Integer). Not sure about that, but none of these cases will be as confusing as what is possible with autoboxing.

Brian Gilstrap said...

Peter, you do a good job of describing how Groovy behaves. And in that explanation you provide confirmation of my assertion. You write:

"At runtime the reference type is irrelevant"

This is not the behavior of a strongly typed language.

Note that I don't mean Groovy isn't behaving the way it was written to behave. What I'm saying is that Groovy should not behave in this manner OR Groovy should not make the claim that it supports strong typing. The very limited support it provides for typing (optionally doing casts to make sure objects assigned to references conform to the reference type) is so small a part of the conventional meaning assigned to the term 'strongly typed' that it is extremely misleading to make that claim for Groovy.

Peter Becker said...

Groovy first of all claims to be a dynamically typic language. From what I've seen (and I'm not really into Groovy) they claim to also support restricting variable types statically (I'd actually call it a limited form of manifest typing). That is rather independent of the question of static vs. dynamic method dispatch, and as I said befoer, Java mostly dispatches dynamically -- so in some way the behaviour you observe in Java is a bit inconsistent.

BTW: don't confuse strong typing with static typing. E.g. Java has mixed static and dynamic typing, and they sometimes differ, e.g. the JVM knows arryas are not covariant and that null is an instance of the absurd type, while the compiler let's you use arrays covariantly and thinks null is an instance of Object -- both features that make me claim the dynamic typing is stronger in those cases. Stronger typing means how much the type system can express/how many bugs it can avoid, while static/dynamic typing is about the when.

Maybe I should stop ranting, though. Java's type system (and its severe brokeness) is a pet peeve of mine and it probably showed. I personally don't care enough about Groovy to have real issues with it, it just seemed to me that a lot of what you are running into here is based on that stupid idea called autoboxing. And I stick with that notion unless someone provides an example as confusing without the use of autoboxing. I think that's impossible without breaking superclass contracts when overriding.

Meyrick Kirby said...

I couldn't easily paste my thoughts on this problem as a comment, so I've placed them on my own blog.

See what you think.

David said...

I'd estimate Groovy becomes a liability when the team size exceeds 5 people (fewer if they aren't all top notch developers) and when the code size exceeds 250k (developer-written) lines of code.

I'd suggest that your development team is your primary liability if they've written 250k+ lines of Groovy. Either they are extremely poor Groovy programmers, or the project was completely inappropriate for Groovy. I've never even seen a Java project larger than that for any reason other than bad architecture driving code bloat.

I'm not sure what I think about the example as a technical issue, but I do question it as a practical issue. Could I play devil's advocate and challenge you to devise a real world problem, with a solution implemented with idiomatic Groovy, where this issue has potential to arise? I'm too fixated on the lack of a context requiring polymorphic collection handling, the unnecessary lines of code, and extraneous strong typing (given the lack of context) to be able to focus in on the practical problem. Perhaps the code fragment that illustrates a Groovy problem should be written in idiomatic Groovy?

Given a simple start:

def list = [2, 1, 0]
println (list - 0)

How do we get into the situation you described? I'd dare to presuppose that unnecessary complexity is a prerequisite to demonstrating the problem in action. (FYI, I've written many thousands of lines of Groovy, so I'm not approaching this from a naive perspective of artificial simplicity. But problems that can only be demonstrated with artificial complexity only prove that artificial complexity is bad.)

Brian Gilstrap said...

David, you are thinking too narrowly. Groovy effectively does not honor interfaces at runtime. It does not limit itself to the methods available on an interface via which I'm invoking a method. As a result of this, I lose the ability to reason about what a particular piece of software is doing. This ability to understand the ramifications of a change without having to examine large swaths of source code is crucial to not just the development of medium-sized and large systems, but especially important to their maintenance and evolution.

In my experience and opinion, this sort of you-can't-reason-about-what-it-will-do-unless-you-run-it style of programming breaks down very rapidly. Instead of 5 developers and 250k lines of code, I'd estimate more like 2 developers and 75k lines of code. Or at best, 3 developers and 100k lines of code.

That's a real worry for non-trivial systems that have a lifetime of more than a year or so. I guess if you know you'll be scraping the entire application in a couple of years, this might not be such a problem. But most medium-sized to large applications I know of stick around for decades.