Interminably Long Timeouts for META-INF Under IIS

How's that for a catchy title? To recap recent events: I'm working on speeding up a Java applet, sloppy code in applet libraries try to load resources from the server which then 404, you can avoid this by setting the codebase_lookup property to false in the applet tag, and finally eliminating 23 megs of invisible data can help speed up downloading. Now that we're all caught up, let's turn to today's adventure: "deployment nightmares" OR "why the hell doesn't my test environment match production?"

Applet Won't Load

I finally got the applet to the "good enough for government work" level of load-time performance. Understand that I don't even work on any of the code in the applet, I'm just trying to optimize what's there and how it's delivered from the server. Today was the day we decided to quietly deploy to production.

The first sign of a problem was when the Apple guys came into the office. The applet wouldn't load for them in either Safari, Firefox 2, or Firefox 3. However, it worked fine on every server they tried except the production server. While trying to figure out what was going on it turned out that the issue had to do with all machines using JRE 1.5 regardless of OS or browser. They all worked against every server except production.

Differences Between Production and Test Environments

In production we have some sort of load balancer, Tomcat is behind IIS, and it's an external network. Nothing in our test environment has a load balancer in front of it, only one machine has IIS but works fine, and my external EC2 deployment is obviously off our network. I'm not sure why we don't mirror as much of this as possible in our test environment, but we don't.

Now back to the bug. Turning off the load balancer had no effect. Eventually, someone let their browser sit long enough to see that the applet did in fact load. It just took around 10 minutes. I finally noticed that the Java console would hang on different non-existent resources it tried to load from the server. I used cURL to retrieve the URL and had to wait 2 minutes until it returned an empty reply. Most non-existent resources timed out immediately. Only URLs that contained META-INF or WEB-INF would hang.

Various 3rd partly libraries were trying to load odd things from the server as I mentioned previously. A few of these load attempts point at the META-INF directory. This only happens under 1.5 because I used the codebase_lookup parameter in the tag. Tomcat, Apache in front of Tomcat, and our internal IIS server all return immediately. The first two serve a custom 404 page while the IIS server sends an immediate empty reply.

WEB-INF and META-INF Protection

Both WEB-INF and META-INF are directories that you probably shouldn't be exposing. In fact, in most versions of the Tomcat Connector the connector will automatically 403 or 404 when any resource from those directories is requested. In our case, we were running an older version of the connector that just happened to have a bug that caused requests to either directory to take 2 minutes to timeout. A quick upgrade and an IIS service bounce fixed everything.

So the debugging lessons for the day are: use something like ngrep to watch your traffic, your test environment should mirror your production environment, applets under 1.5 sucks, and check your version numbers on third party libraries (and consider upgrading).


Leave a Reply