Thursday, July 31, 2008

AOPing and loving it.

I've always thought of AOP in terms of things like AspectJ which manipulate byte code (I don't really love the idea of byte code injection) , and never really as a design pattern that could be implemented in Java.  But lo and behold there are plenty of Java based implementations (I believe based on dynamic proxies).   I've been using Spring's implementation of the AOP Alliance API   http://aopalliance.sourceforge.net/doc/index.html
Using AOP really allows you to cut down on non-model related code in your model.  One of the classic examples is logging, how great would it be to not see any logging code in your model code?  We do lots of reporting on our models and sometimes it's very difficult to hook into the model at certain points to do the reporting.   
We just started adding in AOP hooks to our factories.  Which has been great.  Now we just pass a method interceptor to a factory, and the factory takes care of wrapping the appropriate objects with our interceptor.  On the reporting side we have a reference to our method interceptor.  As our code is running the method interceptor is collecting state that we can report on.  It's like totally non-invasive implementation of the observer pattern.  I love it.  
Suddenly we have a whole new world of reporting and testing that we can do which would have been very difficult previously.  
Seriously, think about testing.  If you're unlucky enough to be stuck with legacy code that's hard to work with, but lucky enough that the developers used factories.  If you can slip in a mechanism for  wrapping objects now suddenly what was difficult to test is easy! You just intercept a method call on an object you care about and verify that it's state is correct, or what ever the heck you want to do.  I'm finding it über powerful.
But with über power comes über responsibility. Yeah an implementation of a method interceptor might look like so:
Object invoke(MethodInvocation i) throws Throwable {
   Foo foo = (Foo) i.getThis();
   if (foo.isAFoo()) {
        System.out.println("yay found a foo");
   }
  Object ret=i.proceed();
  return ret;
}
I just want to point out here you suddenly have access to an instance of foo and also all the arguments that were passed in.  So woe unto thee who changes state whilst intercepting a method.  Because good luck debugging that.  Besides that troubling point, this stuff is awesome.
yeah, "lo and behold", and "woe unto thee", I wonder how many idioms I can slip into into a blog post without running out of steam.

Wednesday, July 16, 2008

Java Set / HashSet API

Recently I was confronted with a problem.  I have two Sets of objects
S1: f1, f2, f3
S2: f1, f2, f3
Where S1.f1 != S2.f1 and S1.f1.equals(S2.f1) are both true.
The two Sets each represent the same object identities but they are being evaluated in different contexts.  At some point later in time I need to find out the differences between S1.f1 and S2.f1
Now Set has some great methods on it like:
contains(Object o)
remove(Object o)
And then there's HashSet which is a set backed by a sweet sweet HashMap, which should make it fast to do contains() + remove().
So what I need to do at some point is:
for( F f : S1) {
    F fLocal = S2.get(f);
   //do some comparisons between fLocal and f
}
but I can't.  The Set nor the HashSet API have a get(Object o).  They make the assumption that if .equals() returns true there are no differences between the two objects.  I think I disagree with this premise.  At least in my recent experience .equals() is talking about an ID of an object. With that ID I wish to track the state of that object evaluated in different contexts.  
Maybe this is abuse of .equals() and I should get over myself.  But I wish Java's API's left it up to me to decide the meaning of .equals() and leave their API's a little more open ended.
Maybe it's not part of the core API, but at least they could have implemented this in HashSet.

Tuesday, July 8, 2008

hub and spoke vs chained architecture (or adding more Antlr tree walkers)

After writing a Graphviz Dot language parser and one tree walker, I made the decision to go for a hub and spoke architecture vs a chained architecture.  Where the hub was a Dot file and the spokes are multiple tree walkers.  
I had already parsed the Dot language to populate an internal domain object.  Next I needed to render this in a Java graphing library that allowed me to specify positional information for each node (I chose the prefuse library (http://prefuse.org)).  I also knew I wanted to parse everything from one dot file that looked like so:
graph g {
a [x=3,y=100]
b [x=4,y=150]

a--b;
}
I saw two basic options:  
  1. Update my current tree walker to read node and node attribute information, update my domain model to contain that information.  Then write a converter from my domain model -> the prefuse graph object.
  2. Write a new tree walker that knows about node and node attribute information, and directly populates a prefuse graph object.
Option #1 seemed like a chained approach where #2 was more like a hub and a spoke.  Clearly context is king, but in my context, going for a hub and spoke approach really seems like the best way to go, and looked like soooooo much less work (Option #1 has like a million pieces where #2 had like 0, really which would you do?).
I implemented option #2 and after working out the fiddly bits with prefuse it only cost 2 hours to build and test.  
I think that's pretty fast.
It seems that Antlr really encourages going towards a hub and spoke architecture and in this case I think that turned out to be a really good thing.  Now I have two spokes... how to build more??!?!?!  If two is good more must be better right?
One mistake I made, I didn't do any unit testing of the dot language parser.  The tree walkers are well tested so the parser is at least tested via integration.  I haven't spent enough time figuring out how to test the parser, so I guess that's the next step that should have been the first step, oh well.

Wednesday, July 2, 2008

Graphing Libraries

I'm looking into graphing libraries right now and I've been working with Graphviz because some folks here use it. Another language / tool that also got mentioned was GraphML, which is from what I've been told a little more powerful than Graphviz.

Here's a sample of the Graphviz dot language:

graph g {
  A -> B;
}

Guess what, that creates two nodes with circles around them and connects them with a line.

To do that in GraphML:

<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns"  
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns
     http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
  <graph id="G" edgedefault="undirected">
    <node id="a"/>
    <node id="b"/>
 <edge source="a" target="b"/>
  </graph>
</graphml>
heh. ouch.

Their homepage says it all:

Unlike many other file formats for graphs, GraphML does not use a custom syntax. Instead, it is based on XML and hence ideally suited as a common denominator for all kinds of services generating, archiving, or processing graphs.
I guess so, but what's better to optimize for, a service using your graphing language, or the poor schmuck who has to use it? Being more powerful is great, but it looks annoying to type. That's really what I want API / language designers to optimize for. How annoying will it be to type :)

Tuesday, July 1, 2008

Antlr - I finally found and excuse to use it

Yeah, Antlr's cool stuff.

I wrote a parser for the Graphviz dot language (http://www.graphviz.org/doc/info/lang.html) and spit out an AST containing the edges and subgraphs. I took that AST and populated a graph in our domain model. And honestly, once I got over the hump of learning Antlr, it was really easy.

I didn't purely look for an excuse to use the technology (okay a little bit). We needed a representation for a graph. Why invent one when dot is so nice, and will give you a nice visualization. So now I have a DSL for creating my domain model (dot), that also gives me a nice visualization of the kind of model I'm creating. Two different projections of the same artifact. That's really cool I think.

Unit Testing Silver Bullet

http://www.infoq.com/news/2008/06/Unit-Testing-Silver-Bullet

I liked this article, I thought it was interesting to think about different ways to enforce code quality. The article's main focus was comparing TDD to Clean Room Software Development. If you don't read the article: Clean Room sounds like incredibly intense code reviews where you have to 'prove' mathematically to your peers that your code works. But I think comparing Clean Room to automated testing is comparing apples to oranges.

How would Clean Room help with maintaining legacy or bad code? What's the point in having a room full of people pour over a 1000 line method that only weirdo geniuses can understand? If you're not working with legacy code, then Clean Room sounds great if you have a room of people who want to pour over your code, but who has that liberty. At all the million (read 3) places I've worked, intensive code reviews has never been a priority.

Even if you worked somewhere that code reviews were a priority, and everything was peer reviewed constantly and you didn't have any legacy code, then great, forget automated testing! But the minute you don't have all those things you need something else. Peer review is great, but it's very brittle. I think that's one of the advantages of automated testing, you have an artifact that lives on and provides at least some value.

The article did have a good point though, what is the value of automated testing + TDD? I think that's really hard to quantify. Personally automated testing has rarely been useful for finding bugs (but when it has I do jump up and down and tell EVERYONE I can, it's awesome!). I have found it very helpful for learning. Whether it's an API or Legacy code I have to maintain. I also find it very helpful for designing my code (aka TDD). I'm totally addicted to this.

These days I stub out all my classes and start writing tests. I write tests for everything I want to say, then fill in the blanks, then write more tests. It's a very top down approach, but it's working for me. And what really sells me on it is that it has great side effect, automated tests that every once in a while make me jump up and down because they found a bug.

 
Web Statistics