Squid Corrupts Jar Files?

In an attempt to speed up launching of a 3+ Mb WebStart application for users behind an overworked firewall, I installed the open-source Squid web cache on a Linux machine inside the firewall. By configuring clients to load jars (and other content) through the Squid proxy, each jar file only had to be pulled through the firewall a single time. The resulting speed boost at peak usage times was dramatic.

Several weeks after the cache was deployed, users began reporting startup failures on the morning after one of the application’s jar files was updated. The WebStart download window reported “Failed to load resource http://…”, and the Details button on the subsequent Unable to launch dialog revealed a “Corrupted JAR file at http://…” error. The application launched without error on machines outside of the firewall, and also worked fine inside the firewall if WebStart was configured to bypass the proxy.

We eventually determined that the problem originated in an incomplete configuration of the Apache web server from which the jars were loaded. Under recent versions of Linux (and other unix-ish operating systems), Apache uses the contents of /etc/mime.types to determine the value of the HTTP Content-type header that is returned in the response to each HTTP GET request, based on the file name extension of the requested file or object. If no mime-type is found for a given extension, Apache reverts to a default specified in its configuration file. The standard Apache configuration (e.g., in /etc/httpd/conf/httpd.conf) includes this segment:


#
# DefaultType is the default MIME type the server will use for a document
# if it cannot otherwise determine one, such as from filename extensions.
# If your server contains mostly text or HTML documents, "text/plain" is
# a good value.  If most of your content is binary, such as applications
# or images, you may want to use "application/octet-stream" instead to
# keep browsers from trying to display binary files as though they are
# text.
#
DefaultType text/plain

Unfortunately, the default configuration of /etc/mime.types does not include an entry for the .jar extension, so Apache will claim that jar files are plain text.

WebStart happily ignores the value of the Content-Type header, so this is normally not a problem. Squid, however, manages cached objects differently based on their content type. The result is that jar files may be saved in the Squid disk cache in such a way that alters their content.

To fix this, it is necessary to tell Apache that jar files contain binary data. This can be done by finding the line in /etc/mime.types that starts with application/octet-stream and adding jar to the list of extensions. For example, the line in /etc/mime.types might look like:

application/octet-stream        bin dms lha lzh exe

Changing this to:

application/octet-stream        bin dms lha lzh exe jar

and restarting Apache solves the problem.

Note that this change will not force Squid to replace a corrupted jar file that it has previously retrieved. You can fix this by updating the names of the jar files or using Squid’s client program to purge the jars.