You use the Sun ZIP API (package java.util.zip.*) and you experience some issues when working with zip files which contain special characters and / or umlauts e.g. German umlauts. So far the ZIP API comes with following restriction:
As long as the zip files and all their entries are encoded in utf-8 you don't face any issues at all. Zip tools like Winzip or PKZip encode the file names usually in Cp437. When you work with zip files created with these tools which don't contain any entries with special characters and / or umlauts everything runs smoothly. When your zip file contains any special character your Java applications crash and you get an IllegelArgumentException.
This is a known JDK bug and was already reported in the Java bug database. For further information have a look at the bug details: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4415733 .</p>h1. Sample
The sample zip file structure looks like the following and was zipped with Winzip (Cp437 encoded):
- test Ä
The following Java code snippet unzips the test.zip (Cp437 encoded) file described above and prints out all the entry names to the console. Once the getNextEntry() is called for a zip entry which contains special characters and / or umlauts your Java applications crash with an IllegalArgumentException which is shown in the error section.h2. Error
As you can see in the stack trace below the code snippet above crashes while calling the getNextEntry() method due to special characters and / or umlauts in the file name.h1. Solution
Sun JDK 1.4.2
The regular support for SUN J2SE 1.4.2 ended at October 2008. SAP and Sun did a close an agreement to enable customers to continue using and receiving support for SUN J2SE 1.4.2 beyond the regular support period. For further details have a look at following note [https://service.sap.com/sap/support/notes/1230512 | https://service.sap.com/sap/support/notes/1230512].h3. Prerequisites
The solution for handling zip files with special characters and / or umlauts which is explained in the next chapters is part of the version JDK 1.4.2_22. Therefore make sure that you optain revision JDK 1.4.2_22 or higher.h3. Optain latest revision of Sun JDK 1.4.2
The following note <a href="https://service.sap.com/sap/support/notes/716604" target="_blank">Note 716604 - Access to Sun J2SE and recommended J2SE options</a> explains how to optain the latest revision of the Sun JDK 1.4.2 as a SAP customer.
h3. Solution in detail
With the JDK 1.4.2_22 Sun introduced the following two new VM parameters to deal with the encoding of zip files and to specify it as parameters. If no property is set, the behaviour is the same as before. New parameters:
Make sure that you only set one of the two parameters. Setting zip.encoding or zip.altEncoding depends on your requirements. The parameter zip.encoding can only be set as VM argument. The parameter zip.altEncoding can either be directly set as VM start parameters e.g. in the NetWeaver AS Java settings via the ConfigTool or dynamically via System.setProperty(...) as shown below:
AS VM start parameter:
Note: Keep in mind that setting these parameters as VM start parameters in application server environments affects all other applications. When you set the parameters for an application server a server restart is required.
Note: Keep in mind that setting this parameter programmatically in application server environments could also influence other running applications.
h4. Parameter zip.encoding
If the parameter / property zip.encoding is set, the defined encoding will be used in all cases except for jar files. In case of errors there isn't any fallback. The parameter can only be set as VM parameter as already mentioned above. If zip.encoding is set and zip.altEncoding is set as well zip.Encoding it will be ignored and not taken into account.h4. Parameter zip.altEncoding
If the parameter zip.altEncoding is set, first of all utf-8 will be used and only in case of errors the alternative encoding is used as fallback. The property is read at usage time, so you can change the property at runtime as well as mentioned above. The property will only be taken into account if the zip.encoding parameter isn't set.h2. Sun JDK 5.0 / 6.0
The solution explained for the Sun JDK 1.4.2 with revision 1.4.2_22 will also be made available for the JDK 5.0 and JDK 6.0. It is currently under clarification with Sun when the solution will be integrated. As a preliminary solution you can modify the class java.util.zip.ZipInputStream by yourself at your own risk. How to do this is explained in the following part.h3. Modifying java.util.zip.ZipInputStream
Reimplement the class ZipInputStream and modify the readLOC() method of the class where the local header information of the zip archive is read and replace the line ZipEntry e = createZipEntry(getUTF8String(b, 0, len)); with the following code:
Keep the package name java.util.zip in order to have access to protected variables and methods of other classes in the package. It is up to you whether you want to provide an alternative for the ZipInputStream e.g. a ZipInputStream2 class or whether you want to make these changes available in general for all applications which make use of ZipInputStream.
The implementation above first of all tries to read in the file name with utf-8 encoding. If it fails the IllegalArgumentException is caught and the encoding is set to Cp437. Keep in mind that it is important to set the right encoding here.h3. Creating the jar file
Export the modified class as jar file, name it e.g. fixed.zip.util.jar and copy the jar file somewhere to your file system.h3. Adding jar file to bootstrap
The modified jar file has to be added in your bootclasspath. To do that you can set the following parameter as VM argument:
-Xbootclasspath/p:<fullPathToTheJar e.g. c:\fixed.zip.util.jar>
h2. JDK 7.0
The solution explained for the Sun JDK 1.4.2 with revision 1.4.2_22 will already be part of the upcoming JDK 7.0.