BlogicBlog: View from the trenches

The blog about Java and XML with focus on troubleshooting issues and tools.

Wednesday, March 23, 2005

Re: Put down that decompiler - part two

Permanent article link

In my previous article on this topic, I had given an scenario which justified the use of the decompiler. I have now remembered a much better example.

A customer had problems after upgrading Weblogic 8.1sp2 to 8.1sp3. His JSP - which was rather large - suddenly stoped working with 'method exceeds 64K' messages. The JSP itself hasn't changed between the versions.

The obvious suggestion here is that BEA's JSP compiler started to generate larger code. We had one issue matching that, but reverting to the old behaviour via flag switch didn't help.

Unfortunately, due to the complex nature of the JSP, I could not replicate the issue in my lab (too many dependencies and propriatary taglibs). So, I get the customer to send me the class files generated by both versions of JVM (JRockit's equivalents of SUN's 1.4.1 and 1.4.2).

The class file sizes are different!

So, given the same input JSP, but different resulting class sizes what would you suspect? I figured that the JSP compiler must have other, not yet identified, changes and get the customer to give me java sources generated by the JSP compilers.

Now, reading the source generated by a JSP compiler in my eyes is just as much of a voodoo and 'under-the-covers' work as any other decompilation. But of course, it was our (BEA's) compiler, so I had to do it anyway. No way to shift the blame to the other vendor here.

So I slog through the source versions trying to identify the differences that are NOT comments. Eventually, I find out the sources are the same!

So, same JSP produces the same Java source, yet the class sizes are different. This is where I had to pull out a decompiler or to be exact the disassembler (javap).

Unfortunately, decompiling both classes produced so many differences on a bytecode level that I had to go back and try to create a replication.

To cut the long story short, it turned out that the difference in file sizes only appeared when there was a try/finally clause. From that point on, the simple test case showed that the content of the finally clause was duplicated into the try class. And - worse yet- with nested try/finally clauses, the outermost finally could get copied up to four times.

As you can probably imagine, BEA's JSP compiler had a lot of nested try/finally clauses in an attempt to recover from nearly any JSP failure possible. The resulting copies of large finally clauses were more than enough to blow the method limit out of the water.

So, I had a replication and I had an explanation of what happened. But I still did not know why. As the problem was happening in JRockit JVM, I escalated it to the JRockit engineers.

They came back and said that there was no JRockit change that could cause that. But, they spoke to their Sun counterparts and somebody mentioned an internal bug report that may have something to do with it. It took a while to find the internal partner's account number, not something they just give out to random support personnel.

Eventually, I got to read the bug and, sure enough, Sun changed the behaviour or javac to generate duplicate sections (Bug ID: 4494152). And the reason they did that was so trivial that there must have been other ways to fix it. In fact, the change causes some ugly cascading bugs (1, 2).

In summary, there would have been no way I were able to identify the problem without breaking out the javap. And no way would anybody else be able to identify a non-public Sun change without having a clear cut replication case.

Disassemblers, decompilers, tracers, etc are all effective tools of a developer. Refusing to ever use one due to the vendor's responsibility concept is not really a supportable stance. Even book publishers think that way (1, 2).

BlogicBlogger Over and Out

Tuesday, March 22, 2005

Podcast: Fight, Fight: Doctorow vs. Scoble on AutoLink

Permanent article link

You don't normally get to hear the near-famous people fighting. Usually, they have their responses polished or they agree to disagree. Not so in this recent podcast on Google's AutoLink from the ITConversations.

In this one, both Robert Scobble and Cory Doctorow lose their cool and start really throwing words around without listening to each other. Which is a pity, because it would have been nice to see them actually give in a point or two to the other side.

It would be interesting to see what the Great AutoLink FlameWar will end up with, though I have to say I am on Cory's side. I have just started using GreaseMonkey for the Firefox and think it is Good.

Either way, I strongly recommend the podcast for all its noise to signal ratio.

BlogicBlogger Over and Out

Re: Put down that decompiler

Permanent article link

Andrew Savory(?) writes that if you have to stoop to decompiling a product to figure out a problem, it is the vendor's fault because they chose not to supply you with the code and documentation. So, if the vendor does not provide the source and the particular functionality is not well explained, it is completely vendor's fault. I wish it were so simple.

I do agree that in the ideal world, every feature would be documented and/or working 100% correct. In reallity, it may be somewhat different.

To give an example, Weblogic server allows to add your own security providers by dropping a jar into a particular directory. So, one of BEA customers gets a patch for the security provider code and dutifully drops it into the directory as per detailed instructions.

And the fix does not work. Couple of support rounds happen where frontline support checks and rechecks that the customer has the right version of the file (yes), that it was put in the right directory (yes), that the fix does match the problem (again yes). All is correct, but nothing changes.

The case gets escalated and out of desperation, we (effectively) stop beliving the customer and ask to provide a listing of the security provider directory showing sizes and dates of all the files.

And then we realize that instead of a single jar (patched providers bundle), there is also another file in there called something like providers.jar.bak, an original copy of the bundle. Removing that file makes the problem goes away. Looking at the source code, it turns out that every file in that directory is loaded up as a jar with a provider bundle. And so, the old bundle kept being picked up.

Whoever wrote the documentation, asked the old provider to be backed up. And so they were. But nobody expected the backed up file to be in the same directory. A bug was filled and issue was fixed.

But all this took a while. Mostly because we did not have access to the customer's system and everything was being done from the arm's distance. If the customer was conversant with a decompiler, they could have reverse-engineered the code that loaded the jar and ether noticed the problem right away or put log statements in to print out when the jar was loaded (and see it ping twice, rather than expected once).

The lesson here is that, yes this was dodgy and should have been fixed (and was). But there is a range of situations where the problem lies in shades of meanings and reverse-engineering will often give a better and faster answer than jumping up and down and screaming for the vendor to fix 'something, anything, whatever it is'.

On a related note, it might have been possible to identify the same problem by running a FileMon and noticing that backup file was actually read in where it should not have. But in my eyes, FileMon is just one of the tools in reverse-engineering/decompiling toolkit, so
it would be out of Andrew's reach as well.

BlogicBlogger Over and Out

Sunday, March 20, 2005

RE: MIke Clark's article on making software troubleshoot itself

Permanent article link

Mike Clark is looking at automatic first-line support by building some auto-support functionality straight into the program. It is an article worth reading.

One thing that the article does mention - in a somewhat negative way - is the 'secret' checklist the developer would go through to troubleshoot the application. He suggests that the 'secret' checklist can be automated by the tests. In my own experience (with BEA weblogic support), the checklist is usually more like a branching tree of decisions that would be fairly hard to encode as any simple set of JUnit tasks.

I wonder how real Mike's examples were and what percentage of the checklist he managed to automate?

To be perfectly clear, I do not think his article is wrong. Far from it, I think it is great direction to explore in further details. I just don't want people to accept the support checklists as 'secret' in general. I think the support techniques should be paid attention to and built into the great knowledge web that anybody can learn from without having to rely on an expensive specialist at the other end of the line.

I know that there is a lot of FAQ style websites dealing with the specific problems and they are good. But we also need some meta level discussions on the open forums as well.

Perhaps a support-wiki or Wikiversity course would be the way?

BlogicBlogger Over and Out