BlogicBlog: View from the trenches

The blog about Java and XML with focus on troubleshooting issues and tools.

Tuesday, November 30, 2004

Slashdot: Should vendors have root access to customer systems?

Permanent article link

Interesting - and at comments threshold 4, meaninful - discussion on slashdot whether vendors need root access to customer's equipment/OS. Eventually, it diverges into dumb vendors vs. dumb customers.

At BEA, we don't actually need root access to the system. In fact, you have to go out of your way to run at port 80 or with other root related issues. Comes with a Java territory, I guess. In fact, we prefer not to login into the customer system at all, but rather collect logs and analyse them on our own machines.

On the other hand, I do feel the discussion of who is smarter to be close to my heart.

I think that after years of working as Backline (Tier 3) support at BEA, I find myself liking two extremes of the customers.
  • I like the customers who know what they are doing, how their system or network is setup and where their logs go. I like the customers who I can ask to run the software under nohup and redirect both stdout and stderr into the same file.
  • I also find that I don't mind customer with not much clue, but aware that they are out of their depth. There was a number of times when an engineer has to fix the system they weren't even aware of until that evening. And it usually is evening and - lo and behold - they have a whole night to sort it out.
    These customers will follow my direction to the letter, they will tell me what they see, and they will read back the error messages with a precision. Yes, it takes a bit longer to go through the steps and the support engineer need to really think on their feet, but it is still ok. And when you fix their problems, these customers do know that you helped them.

The customers I find myself not enjoying are those in the middle. They think they know what the problem is, therefore they only send partial logs, snip out unrelated error messages and think the problem is obvious from a terse general description. (try Server crashed today. Please send patch.) For them, I find I have to explain every step in very clear written down details.

As an example, we often request an access log and server log in parallel. A customer like this would send access log for last 3 months and a server log from the last 3 hour run for the problem that happened 2 day prior. I have to re-request the log and in process explain everything about log roll-overs, checking the log entires' timestamps and general log locations. Not something you would expect to need to do for a sysadmin of a large organisation.

In terms of case types, the most hated situation around the office here is a customer that creates a new case (again, Server crashed, please fix) and requests an engineer to join the bridge where 30 people have been discussing this issue for 8-30 hours already.

Very often, we are not useful without seeing exact exception and/or log files in a context. Also, BEA product family is very large and - even though we support all of it - we have our own individual expertise areas. My hanging server skills are much higher than my JMS message lost skills (which my coworker excels in), so just asking somebody to join the bridge (and stay on it until resolution) will actually benefit less to the customer than spending extra 10 minutes on writing down the description.

For myself, I find it survivable to land in one of those bridge calls as long as the customer has a clue.

We have two very large USA telcos that like to do bridge calls. One has engineers with a very strong clue (even their CEO does), another seems to provide no training OR resources to their so-called sysadmins. Guess which one I hate to work with? Guess which one I will NOT consider as my provider, as I can see how bad their problem-solving skills are.

Anyway, enough rant. If anybody is interested in specific aspects of support life, ask in the comments.

BlogicBlogger Over and Out

Monday, November 29, 2004

Link: Weblogic and Active Directory Authentication

Permanent article link

Luke Dewavrin is writing about what it takes to get Weblogic use Microsoft Active Directory as an Authentication Provider.

He also mentions couple of issues that people get burned by the first time they use WLS Security Providers architecture.

Specifically, he talks about the need to set JAAS flags to "Sufficient" or "Optional". Let me reinforce that; the flag need to be changed before the provider setup is saved. Otherwise, you most probably will not be able to restart WLS instance again.
Fortunately, WLS 8.1 stores the information in the config.xml and it can be edited fairly easy. As a history note, WLS 7.0 stored the information in the binary form outside of config.xml and to edit it, one would have to follow recovery export/import procedures.

For myself, I always found Active Directory hard to troubleshoot, mostly because of the inability to get good logs. As a comparison, iPlanet Directory Server has very detailed and helpful access logs that I have used many times to isolate issues like incorrect configuration, slow authentication and even firewall timeout. I tried to find an equivalent set of logs for AD, but could not. Maybe they call it something else entirely.

BlogicBlogger Over and Out

Sunday, November 28, 2004

Business model for ITConversations

Permanent article link

Doug Kaye is asking how to make his very reputable ITConversations podcast website actually bring in money.

For me, ITConversations is the only podcast that I listen to reliably as I listen for content, not entertainment. Pop!Tech series were absolutely riveting, especially WorldChanging.com and BioMimicry.

So, following is my idea which I was thinking about for a while and can see having worth beyond one site's commercial success.

The idea is based on having paid subscribers who then get to vote on the show they want to hear.
Following is a sample use case:
  1. I join ITConversations and pay $X deposit (say $20).
  2. At some point later, I find that I really want to hear (for example) Tim O'Reilly's views on podcasting. Or maybe I listened to a Memory Lane and want a follow-up on specific issues. Or I may want an interview on the state of the art of Machinima, which is just entering global consciousness.
  3. I create a request item and commit $Y of my deposit to it.
  4. The request goes to a vote with the other paid subscribers. They can discuss it and hopefully commit some of their own funds to it as a vote.
  5. Once the total amount reaches the threshold, site owner (Doug) accepts the request and effectively makes a sale for future podcast.
  6. When the show is produced, it is available to the people who committed funds to it.
  7. The sponsored show may also enter general availability some time later (or be available at nominal cost to the other subscribers)

There are several advantages to this model:
  • There is no need for micro-transactions for each vote, as they are pre-aggregated by the deposit and can be managed via top-ups mechanism
  • Since the money is already paid, the perceptual barrier of entry (for vote) is easier.
  • Voting means that the topics selected will be user-driven and more timely than trend spotting.
  • If the sponsored show is only available to subscribers (even for free), it makes it very enticing to join, which requires some deposit. And once the deposit is made, it is easier to vote. This creates a positive feedback loop with more cash throughput.
  • Community around voting and discussion (which could be made public) will keep people coming back over and over again.
  • Advertisers will love seeing the vote results and perhaps target the ads at the specific subjects being discussed.
  • Money paid as deposit are transferred before being used up and therefore earn the percentage from early on.
  • Finally, once the system is implemented, it can be used for features requests and other website work that requires a confirmed commitment rather than bluesky ideas.


There are obviously some difficulties to resolve, such as amount thresholds, clear topic selection and satisfaction of paid content. But they are all solvable, at least in a long run.

I would be happy to answer further questions on this idea, if anything is not clear.

BlogicBlogger Over and Out

Thursday, November 25, 2004

Suport Story: A tale of twenty cookies

Permanent article link

Very interesting story of what happens when built in defaults have silent limits. In this story, each ADF application modules (whatever they are) needed a cookie with default configuration. This silently runs into spec limitations.

Fortunately, Weblogic does not generate multiple cookies by default, though you could certainly assign one for each of the webapp.

For me as a support person, it is fascinating to read how other support people solve customer's problems. I wish I could read the internal case log for that particular support case.

If it were me, I would have asked customer to do the requests with ieHttpHeaders, enabled Webogic proxy debug log (if available) or put Ethereal on the wire.

BlogicBlogger Over and Out

Microrebooting paper's reasoning holes

Permanent article link


CIO Today noticed microrebooting research paper.
While I agree that the paper is very interesting, I think there are some holes in it that were not explored (or at least not explained).

Specifically:

  1. Early in the article, memory leak and resource leaks are named as the problems solved by rebooting.
    From what I can see, microrebooting will only solve the issues where those resources/objects are held by instances that are cleared on microreboot. It will do nothing for pooled resources leaked because close/release method was not called. It will also do nothing if the memory leak is through creating too many classses at runtime (e.g. Stubs) or if resources are held in the static parts of the classes. This is because the classloader is explicitly not recreated.

  2. Recommended rebooting sequence of the application server in the paper is as follows:

    1. microreboot of the component transitive closure
    2. then kill -9 the server
    3. reboot the O/S.

    It seems that something is missing between 1) and 2). Where is attempting to shut down the server via normal shutdown command? That would allow to synchronise the buffers, finish writing out logs, etc. Going straight to kill -9 is really an emergency exit and is highly not recommended. In the same vein, I am not sure how much good rebooting the O/S is going to do for Java AppServer. Of course, the proper shutdown takes time, but is faster reboot worth getting your transactions into the doubt state due to the corrupted JTA in-flight store?

  3. Threads. The paper proposes killing all threads associated with the resources. I would be very interested in how they are proposing to do that well, given that Thread.stop is deprecated beyond belief and Thread.interrupt() does nothing for method synchronization deadlocks/bottlenecks.

  4. Memory recovery via microreboot. The paper suggests microrebooting the components to free some space. A more effective suggestion in my thinking would be to have hooks into the component caches and to request to drop all cached content down to the working set or to the start counts.
    So, a typical production system will start with 5 JDBC pool connections, but may go up to 100. Asking it to drop back down to 5 (or working set, whichever is larger) will free up a lot of cached result sets. Same goes for cached Entity beans, etc.

  5. Finally, the main requirement for all this to work is a component storing its state in external transacted storage. Have they calculated whether the cost of shifting to external storage instead of faster in-memory/in-place is worth the time benefits of microrebooting in the long run. I am not so sure, especially with the larger HttpSession data sets I have observed in a real world.
    The exact question here is: how often does one need to microreboot instead of full reboot to save the time lost on satisfying the additional constraints? My own sort-of answer here would be that perhaps extra half-second per response is acceptible replacement, but I would like to know what the researches themselves think.


Still, all the nitpicking aside, Weblogic does provide some things already that resonate with the idea if not the suggested implementation.

  • Component redeploy will try to remove all instances (and classes) on the EAR/EJB/WAR level and reload them in. In fact I am having troubles figuring out how the micro-reboot is better than this already available functionality.
  • Node managers will monitor your system and restart the node if any of the subsystems goes into the 'warning' state
  • JRockit allows to monitor memory usage and fire code triggers on crossing the thresholds. You can do what you like at that point.
  • Weblogic 8.1sp3 monitors JDBC pool entries and can timeout inactive connection and consider it leaked. It also provides connection leak profiling where non-closed connection will scream when it's finilized method is hit. It also provides resource retry for JDBC similar in spirit to what the paper talked about.
  • JMX and SNMP also allow to define all sorts of threshold with notifications triggers.
  • Thread subsystem will monitor request processing time and will log a message when a single requests takes longer then a threshold value to process. In the next version of WLS, it will also print the stack trace of the stuck thread. Notice that it will not kill the thread.


But I do welcome any research that makes support job easier. Check out the other interesting papers at the Microrecovery and Microreboot center

BlogicBlogger Over and Out

Link: Do's and Don't's of Logging

Permanent article link

Jeff Mesnil is a developer who had to play 'support' for a bit. Specifically, he had to read logs produced by others. He immediately discovers a lot of lessons that are not obvious when you only read your own code.

To his list I want to add a couple of things from my own support experience:
  • Make sure you have some way to identify which log entries belong together. Use TransactionId if you have one, thread name if you don't. Weblogic logs put thread name for each message. If this is expensive, use thread ID number instead and correlate them later with Thread dumps.
  • Always put timestamps in. Again, if this is expensive, put System.currentTimeMillis() instead of (new Date()).toString() and put a log entry at the start (or periodically) that shows both (to clarify timezones and allow correlation to other dated logs)
  • If you have a long operation, try to put a log message at the start and end of it. That way, when you later want to discover what your real life processing time is, you can parse those entries. Again, this requires ID and timestamps. One of the frequent problem we have is customers running into default timeout values being too small, but not knowing what the real value should be for them either.
  • Remember that logging is expensive and (at least for log4j) is gated at the write method. So, if you have 100 threads all logging away into the same file, at some point they will all bottleneck on a write method. And if the write method needs to rollover the file or build a complex log string, they will all just wait. I have support case scars to prove this point.
  • Try to grep your own logs instead of eyeballing them. If you have difficulty getting a good set of log entries, the person supporting your application will have even more problems.

BlogicBlogger Over and Out

EPIC is coming. Want to become its editor?

Permanent article link

A very strong (and quite dark) futuristic look at personalised news and media consolidation (8 minutes). Via Doc Searls. Really worth watching for anybody in the blogosphere.

The idea as such has been covered in Science Fiction before, but the anonymous author of this piece has used the real names and very realistic sounding dates.

After watching the piece come back here and think about this. If EPIC is coming, do you want to be its editor? I do!

So, given 10 years, what skills do I need to cultivate to become one? Any suggestions?

BlogicBlogger Over and Out
P.s. I hope it will get put to archive.org or on P2P fast, otherwise it will kill the website very fast.

Technorati response to the Bloggercon Overload Session

Permanent article link

I wrote about my impressions from Bloggercon's Overload session before. Now, Tantek (who works for Technorati) details specific technologies that could help to implement the wish list items mentioned.


BlogicBlogger Over and Out

Monday, November 22, 2004

Please don't make me support this WSDL 2 spec

Permanent article link

Rich Salz writes about major problem with WSDL 2.I really hope it gets fixed before BEA decides to implement it. Otherwise, it will be completely unsupportable.

We already have problems with handling the customers bitten by vague specs, but this looks nightmarish.

BlogicBlogger Over and Out

Friday, November 19, 2004

On Bloggercon III Overload session

Permanent article link

It was very interesting listening to the Overload session recording from ITConversations.

Unfortunately, there are no transcripts of the recording available. Having to listen to the record second time just to pick up points worth of commenting on is somewhat annoying.

One thing however I do want to say. Robert Scoble was the moderator, due to his 1000-blogs-a-day reading routine. Unfortunately, he did not contribute enough of his own methods to provide the enlightment. If anything, it was the opposite. He does not like categories, he does not spend much time thinking what goes into linkblog, nor did he seem to care for metadata.

However, I am sure he does have some techniques that would be very valuable to know.

So Robert, if you are reading this, how difficult would it be for you to video yourself while you are doing the scanning and reading process. The way your eyes move, the pauses, the hands will surely tell us what is hard to describe in words. And if we could see what the screen actually shows at the same time (in a side by side view with a camera), it would be yet better (more than twice as good actually).

Perhaps one of the HCI labs equipment could do something like this.

I enjoy your videos of other people. I would enjoy one of you doing what we can only hope one day to achieve.

BlogicBlogger Over and Out

Monday, November 15, 2004

Re: Code Generation: good or bad?

Permanent article link

David Rupp writes - eloquently - that he does not understand why people are reluctant to use software that auto-generates code (e.g. Hibernate and AOP/AspectJ).

He is of course right. Code generation on the fly is in exactly the same league as JSPs, EJBs and dynamic proxies in terms of how the code one writes does not corresponds to the code that actually runs.

But he is also wrong. What we have here is not a Good/Bad breakdown, but rather a spectrum. And the measure of that spectrum is how easy it is to trace the problem with the generated/compiled class back to something one can change.

In the case of JSPs, it is fairly easy to trace back as most of the application servers allow to keep the generated java file and - at least Weblogic - will put JSPs original line numbers as comments in the generated Java code.

The same applies to any offline precompiler. However complex they are, there is always a class file produced in the end that can be used as a reference.

Finally, a third party library can - in desperate situation - be decompiled and the decompiled source will include the line numbers (unless the class had been stripped or obfuscated).

On the other hand, AFAIK code generators do not usually bother with putting in source line numbers, nor do they save anywhere the code blocks that become classes.

I have a case to the point. A week ago, I had a support case which had a ClassCast exception in the dynamically generated wrapper class that Weblogic puts around Oracle driver's proprietary methods. The top of the exception stack was:
at weblogic.jdbc.wrapper.Blob_oracle_sql_BLOB.getBinaryStream(Unknown Source)

Of course, I did not know at first that it was a generated class. I could have searched through the whole WLS source code base forever without finding it. The only hint was the class name. Seeing wrapper has triggered the memories of other parts of Weblogic where we use similar tecniques (RMI, EJB, etc).

Eventually, I traced this to the part of code which generated the class on the fly. And when I did, I found - as expected - that the class is just generated in memory and is loaded from bytearray without any export or debug functionality.

Basically, we were generating a weblogic class at the runtime that will implement all Oracle's proprietary methods, but will add pre and post processing to them. A not-so-dynamic proxy, in other words.

I had to build a custom patch to dump the file out and then replicate the basic logic to trigger the class generation. Once that was done, I finally had class file to decompile and analyse.

However, this was comparatively simple problem in that it was easy to reproduce and the code generated was conditional on classpath (Oracle drivers), but not on any runtime conditions. Were it not the case, there would have been no way for me to reliably confirm the code behaviour.

The same issue will apply to woven-in Aspects once they become very popular. What is a boon for a developer is often a woe for the support engineer.

I think the lesson here is that any code generation framework must ensure that there is an easy way to get class bytecodes, whether through debug flags, explorable ClassLoaders or any other option.

BlogicBlogger Over and Out

Monday, November 08, 2004

Koders: Another search mini-app

Permanent article link

It looks like search is starting to gain momentum, both desktop and service based. In the last couple of weeks, we had:

Desktop Search

Web based search miniapps

  • JarHoo will help you to locate a Jar where a class is defined
  • Koders will allow you to find which open source project contains code you may need and will show some interesting stats about it (e.g. Projected reimplementation time). It may or may not be based on Lucene for the search component.
    I have to say though that I did not feel as a target audience for Koders. I wanted to get in, type the class name , find the useful project that had it and go to that project's home. I could do all, but the last one. Instead, when I clicked on the project's name, I got drawn deeper into the Koders site and was offered to look at statistics and source code. Oh well, Google is always a click away.


I think in the future we will see yet more of search miniapps, possibly very specialized. I can think of several that would be useful for my BEA support work.

Just as an example, I would like a small application that would index all descriptors for both standard and Vendor specific deployment descriptors and then would allow me to search by name substring. That way, I could search for timeout and get a general overview of what timeouts are configurable, in which file and exactly where in deployment descriptor's hierarchy.

Any takers?

The biggest problem of course is who will pay for traffic. JarHoo was running google ads, but it still needed to be sponsored by a hosting company. Such good will may not last long.

BlogicBlogger Over and Out

Sunday, November 07, 2004

I always was curious what writing for O'Reilly was like

Permanent article link

Guillaume Laforge writes about the 'New Author' treatment.
I hope he will continue writing about Life as an O'Reilly Author.

I also wish somebody with a different publisher would write about their experiences.

BlogicBlogger Over and Out

Thursday, November 04, 2004

What type of a Framework Fashionista are you?

Permanent article link

Lara D'Abreo decided to classify developers according to their attitude to Java Frameworks.

I feel I can claim to be Pragmatic, but the truth will most probably reveal at least some of Oblivious. I also seem to remember Dogmatic period of my career, but I have (hopefully) overgrown it a while ago.

BlogicBlogger Over and Out

Tuesday, November 02, 2004

Is gmail planning to introduce new domain names?

Permanent article link

I have a gmail account. Today, when I signed in and then signed out, I suddenly noticed that in the 'username' field it is now showing my full email address. This is instead of just username. So, 'user@gmail.com' instead of just 'user'.

If I am not imagining things and that was the change, then there is only one reason I can guess for that. Gmail is planning to introduce additional domains (as mail.com did) and automatically substituting full address for login is a transition step.

Any guesses on the additional names? Gbrowser anyone?
BlogicBlogger Over and Out

Goodbye Dev2Dev, Hello CodeShare

Permanent article link

A while ago BEA has introduced Dev2Dev, a way to reach out to the BEA community. The website has gone through at least 3 changes, but was still not overly popular. Finally, BEA has realised that a 3rd party will do a better job out of it.

Meet CodeShare, a new project developed by BEA with great help from O'Reilly Media and CollabNet.

The biggest problem with Dev2Dev in my opinion was that you could publish something to it, but it was hard to update. Some of the older sections (like utilities)were also hard to find after a website redesign.

Hopefully, CodeShare - which in many ways seems modelled after SourceForge - will fix that. I, for one, am looking forward to using the new site.

BlogicBlogger Over and Out

Monday, November 01, 2004

Standalone XSLT/XQuery tools from Altova

Permanent article link

Altova, the maker of XMLSpy has released several standalone engines that previously were only available as part of the editor itself. They have XSLT 1.0, XSLT 2.0 and XQuery engines. XSLT 1.0 implementation looks very complete, XSLT 2.0 is less so.

Of course, my own preference is XMLStarlet. I call it The find of XML world because I like to do a lot of ad-hoc queries on XML files. Also XMLStarlet is available on multiple platforms (Altova is for Windows world only).

Still, the more engines are out there, the better it is. We can always do a consolidation later, once the needed feature set is fully understood.

BlogicBlogger Over and Out

Jarhoo is a good step in a right direction

Permanent article link

If you often wonder which jar a particular class comes from, Jarhoo has indexed a number of common distributions of java applications and libraries and allows to do a quick lookup by a name substring or a jar file substring.

Among other applications, it includes BEA Weblogic, so it is sometimes useful for me in troubleshooting classpath issues.

There are some growing pains still:
  • It does not indicate which Service Pack of the product is uses. One assumes the latest
  • It does not allow to add or remove product from a search list. It would have been useful to speed up the search by adding - for example - +product:WLS81
  • There seem to be some sort of double escape of < sign in Jar File Locations entries
An interesting thing is that just from looking at the interface, I can tell it is running Lucene under the hood. The '2 seconds' response time especially is a telling sign.

I wish the project well and I hope that, based on its success, many more mini apps will appear. It is certainly the right direction.

BlogicBlogger Over and Out