BlogicBlog: View from the trenches

The blog about Java and XML with focus on troubleshooting issues and tools.

Sunday, February 26, 2006

What's in the access log

Permanent article link

What is a server access log and how much information can one actually extract from it? If you don't know, read this good introductory article on the subject.

And if you find the article interesting, look for other logs your server/application produces. The easiest way to find out what log files you have is to run ProcessExplorer on Windows or lsof on *nix and see what .log files the program is keeping open and where. Doing this may also help you to discover where all the disk space is gone.....

BlogicBlogger Over and Out

Saturday, February 25, 2006

More GraphViz goodness

Permanent article link

I wrote about GraphViz before, but so many new Java projects showed up, that I thought it was worth making an update.
  1. Grand: Ant config file visualizer that even better than other options I wrote about before.
  2. LightUML: UML generator from Java classes
  3. Linguine Maps: Visualizer library for many types including WSDL, Ant, Hibernate and DTD
  4. JarAnalyzer: Dependency visualization for jar files (with an independent review)
If you have anything where A and B point to C and C points to D, GraphViz is really worth checking out.
It is not in Java however; if you only look at Java products, check out Prefuse instead.

BlogicBlogger Over and Out

Friday, February 17, 2006

The good and bad of outsourcing to Russia

Permanent article link

Yakov Fain writes (from experience) about good and bad aspects of Russian programmers for outsourcing.

I left Russia well before American companies thought of trusting russian (or indian) programmers. But I remember the software and ingenious hacks written back then by my friends to work with limited computing resources available. I participated in programming competitions and summer camps, where school kids learned and applied concepts that are only taught on advanced level of university education in other countries.

To this day, I claim to have gone through two different schools of programming. One is Russian and one is Australian. And you shouldn't have to guess (after this article), which one I feel had given me a stronger grounding in what I do daily.

BlogicBlogger Over and Out

Wednesday, February 15, 2006

Good-bye ProcessExplorer - your license got too strict

Permanent article link

I used to rave to everybody about how good ProcessExplorer from Sysinternals was for technical troubleshooting. Oops, I guess I was too loud.

The new license terms state:

A commercial license is required to use the software in any way not covered above, including for example:

  • Use of the software for technical support on customer computers
This license does appear in ProcessExplorer 10.05. The license is not yet in my other favourite tool (FileMon), but I assume it will get rolled out with the next release.

This may still allow users to download and install the product themselves, but even that seems to violate the spirit if not the letter of the license.

I was going to present at JavaOne with very strong emphasis on the tools from Sysinternals, but now I may have to rewrite that presentation.

I can understand that the company wants to drive the customers towards their commercial offerings, but it is still sad to suddenly be able to do less rather than more with the new release.

BlogicBlogger Over and Out

Tuesday, February 14, 2006

Re: Guerrilla Debugging For Java

Permanent article link

Russ Olsen (via Michael Baum) writes about tools to use in production, when ones does not have access to tools. Specifically, when one does not have access to tools like Eclipse IDE.

He mentions a good list of tools, though he has a bit too much praise about Strace. To quote:
Ever wonder why your program can‘t open that one file? Use strace to find out precisely which file it is trying to open and the exact identity of the error. Does your program simply go into a trance and stop responding? There is a fair chance that you will find out why with strace. Clueless as to why your webapp is leaking memory? Strace may supply the clue.
In reality, there are much better tools to use for those task. At least on windows, FileMon is a much better tool to check which files are being opened and Handle is great for checking which process locks what files (lsof does that for Unix). Thread dumps analysis (my tool) is better for checking what's hanging. And Ethereal is a must for any network troubleshooting.

But the general point is strong. And the section about BeanShell was something I haven't thought of before. Very good summary article.

BlogicBlogger Over and Out

Tuesday, February 07, 2006

Quoted, but misunderstood: What's Missing from Production System Troubleshooting

Permanent article link

Michael Baum quotes my feedback on his survey article, but completely misses that we actually want the same thing. We just approach it from different angles.

To get to the (misunderstood) point:
The notion that IT people need even more data generated by developers kinda misses the point. Troubleshooting production applications is a whole lot different that debugging code in development or staging environments. Production systems involve many technologies and systems that just don't appear in pre-production environments.
I do not ask for more data, I ask that the data format currently being used is reviewed to see whether it is actually useful for troubleshooting/monitoring and that concerned effort is made to change the format where it proves not to be useful.

For example, any messages produced by multithreaded services must include thread/transaction id in them. Trying to extract sequence out of the logs that just intermingle their log entries is next to impossible. Same problem with having timestamps in a format that does not allow to correlate to other log types.

My point was that developer would not see this kind of issues, until they have to do the troubleshooting themselves. Then, they might be more amenable to pleas for better logging formats.

And yes, I did spend 3 years as technical support engineer for BEA looking at the multi-megabyte (sometimes multi-gigabyte) log files for people whose configuration I did not know 30 minutes earlier. So, I believe I did have to deal with the issues Michael have seen at Yahoo and at Splunk too. In fact, I will be delivering JavaOne presentation about this very issue this year.

Speaking of Splunk, it is a great idea and a step in a right direction (as I wrote was 9 months ago). I could see how it would have been useful to me when dealing with large data sets.

Unfortunately, it is only a first step; to replace my advanced troubleshooting environment (*cough* Vim *cough*), I would need to at least be able highlight several patterns at the same time in different colors (e.g. IP address, time sequence and URL type).

But I will be evaluating Splunk in more details and probably will be mentioning it in my presentation at JavaOne. Especially, if it will be downloadable as VMWare image to try under Windows environment.

(Update from Feb 14th: we are on the same page now)

BlogicBlogger Over and Out

Gmail and the periodic spam filtering failure

Permanent article link

Gmail is usually pretty good about spam filtering. But not at the moment. As of the last 2 or 3 days, the spam I know it caught before is now ending up in my mailbox. And I think I know why.

The same thing happened over Christmas. Suddenly all the spam appeared in my inbox. I had a theory then and it seems to be confirmed now.

I think Gmail team drops their filtering level around big celebrations (Valentine's day is coming up). Around these events, people send to each other emails that differ from their normal day to day activities (warning: humour). It is very possible that normal Gmail's algorithms would treat celebratory emails as spam and people would end up loosing messages.

So, to avoid classifying good emails as spam, Gmail team drop the filtering level. That in turn makes some spam emails look normal which annoys people in a different (more habitual and therefore less important) way.

I hope this is not the case and that I am just observing some fixable glitches, but this is the second time now and therefore is a coincidence (up from the chance). If it happens again around may (mother's day), I would think I have a full blown theory.

Oh and any Gmail engineer is more than welcome to comment. :-)

BlogicBlogger Over and Out

Friday, February 03, 2006

The techie way of liberating the podcast URL from iTunes

Permanent article link

Jon Udell is not happy about having to transcribe podcast URLs that iTunes displays, but does not allow to copy. While the general point about lock-in is good, here is a quick techie workaround in a meanwhile.

However much iTunes may want to hide the URLs, at some point it has to actually retrieve something from it, and do it using standard network protocol.

Enter Ethereal, open-source multi-platform network protocol analyser. Using it to get the URL is a serious overkill, but the tool itself comes useful over and over again in many situations.

The steps to snag the URL are:
  1. With iTunes already open, start Ethereal up and run capture in non-promiscuous mode with 'tcp' as the filter.
  2. Go to iTunes and do 'Update podcast' on the podcast you want.
  3. Let the update happen and stop the Ethereal capture. You should now have a lot of packets in the Ethereal's view.
  4. In the filter section (this is different filter from before) enter: http.request.uri contains "xml" .
  5. This should show only the packets that are http requests for files with xml in their name. If nothing shows up, try http.request.method as filter. If there is still problem, try http and see if you have any traffic on the chosen interface at all.
  6. Assuming you finally have the right request, you now do right-click-menu/Follow TCP Stream and copy the URI and host from there.
  7. You are done. It sounds a bit long, but takes about 30 seconds after installation is done and correct interfaces are figured out.
For more advanced HTTP work, it is also worth going to the preferences for Ethereal and make sure that packet reassembly is enabled for all levels of the network stack (HTTP, TCP, IP).

BlogicBlogger Over and Out

Thursday, February 02, 2006

Re: How Hard is it to Troubleshoot IT Anyway?

Permanent article link

Michael Baum reports on the survey of system administrators regarding their troubleshooting activities. It is an interesting summary, but something is missing.

There seem to be a lot of questions regarding how the problems are handled now with the predictable answers of base power tools like grep, perl and Ethereal. What I don't see is any questions on how to fix the problem going forward.

By now we pretty much established that until the developers themselves try to support/troubleshoot their own products in production (or get loud enough feedback), they will not understand how to make their products easier to manage post-deployment.

The surveys of the how do you deal with it now kind should always include questions on why commercial solutions are not suitable (usually due to installation/license difficulties) and also what the companies creating the products could do to make things easier in a long run.

I know some companies slowly do it on their own (e.g. dTrace from Sun), but I think, if backed by organisations such as LOPSA or NaSPA, the progress might have been faster. After all, by now we have pretty much established that the problems will not go away by themselves, but - if anything - will get worse.

And if the System Administrators want to join forces with other technical people running into the same problems, they should pay more attention to technical support people as well as to forensic analysts. Both of these groups also have to deal with finding a needle of important information in a mountain of obscure, disjointed, overwhelming data. The progress in the area of forensic analysis tools is especially fast these days, as it is driven by the very high profile security concerns (most of the February issue of Communications of the ACM is about this very topic).

BlogicBlogger Over and Out