Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-7062777

Still problems with non-UTF8 zipped file names with non-ASCII characters

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Not an Issue
    • Icon: P4 P4
    • None
    • 7
    • core-libs

      FULL PRODUCT VERSION :
      jdk1.7.0

      ADDITIONAL OS VERSION INFORMATION :
      Windows 6.1.7601

      A DESCRIPTION OF THE PROBLEM :
      Problem Introduction:
      No unzipping method that I have used yet works with zipped files with file names containing non-ASCII characters. No method seems to be able to find the real file names of the zipped files. This is a well-known, old, bug that is claimed to be fixed in Java 7. However, I have tried an early access version of Java 7, and also the package from apache: org.apache.tools.zip.*, which is claimed to be a replacement for pre-Java7 zip utils (java.util.zip.*), to add support for file name encoding other than UTF-8. However, neither Java 7 ea, nor the apache solution works. Below I am going through 5 ways to read the zipped file names using java.util.zip.

      Problem description and 5 different attempts:
      I have a zip file with name åäö.zip. The name of the zip file itself causes no problems; it is encoded/decoded correctly. It is the names of the contained, zipped, files that causes the problem. My zip file contains 2 zipped files with names File_1_refäräns.pdf and File_2_dåvälöpment.pdf. I have tried the following methods and jars/jdk etc:

      1. Using jdk1.6.0_20 and java.util.zip:
      Following code is evaluating the browsed zip file:
      ZipFile zipFile = new ZipFile([myFilePath]);

      for (java.util.Enumeration e = zipFile.entries(); e.hasMoreElements();) {
          ZipEntry zipentry = (ZipEntry) e.nextElement();
          String entryname = zipentry.getName();

      ETC...

      Debugging the file name of the first entry returns: “entryname = File_1_ref?r?ns.p”, whereas it should be File_1_refäräns.pdf. So, it is not even possible to unzip it correctly since it doesn’t even get the file extension correctly.

      2. Using jdk1.7.0 ea, java.util.zip, java.nio.charset.Charset, encoding ZipFile by UTF-8:
      Following code is evaluating the browsed zip file:
      Charset charsetISO = Charset.forName("UTF-8");
      ZipFile zipFile = new ZipFile([myFilePath], charsetISO);

      for (java.util.Enumeration e = zipFile.entries(); e.hasMoreElements();) {
          ZipEntry ZipEntry zipentry = (ZipEntry) e.nextElement();
          String entryname = zipentry.getName();

      ETC...

      This time an Exception is thrown:
      java.lang.IllegalArgumentException: MALFORMED[1]
      at java.util.zip.ZipCoder.toString(ZipCoder.java:53)
      at java.util.zip.ZipFile.getZipEntry(ZipFile.java:500)
      at java.util.zip.ZipFile.access$800(ZipFile.java:53)
      at java.util.zip.ZipFile$2.nextElement(ZipFile.java:482)
      at java.util.zip.ZipFile$2.nextElement(ZipFile.java:452)
      at se.aklagarmyndigheten.alba.library.contenttransfer.importcontent.SipFile.validateZipEntries(SipFile.java:278)
      at se.aklagarmyndigheten.alba.library.contenttransfer.importcontent.SipFile.validate(SipFile.java:214)
      at se.aklagarmyndigheten.alba.library.contenttransfer.importcontent.SipImportContainer.onNextComponent(SipImportContainer.java:164)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:613)
      at com.documentum.web.form.FormProcessor.invokeMethod(FormProcessor.java:1630)

      The red marked reference is pointing to this custom code:
      ZipEntry ZipEntry zipentry = (ZipEntry) e.nextElement();

      3. Using jdk1.7.0 ea, java.util.zip, java.nio.Charset, encoding ZipFile by ISO-8859-1:
      Following code is evaluating the browsed zip file:
      Charset charsetLatin1 = Charset.forName("ISO-8859-1");
      ZipFile zipFile = new ZipFile([myFilePath], charsetLatin1);

      for (java.util.Enumeration e = zipFile.entries(); e.hasMoreElements();) {
          ZipEntry ZipEntry zipentry = (ZipEntry) e.nextElement();
          String entryname = zipentry.getName();

      ETC...

      This time NO Exception is thrown, however debugging the file name of the first entry returns: “entryname = File_1_ref?r?ns.pdf”, whereas it should be File_1_refäräns.pdf. Compared to first attempt (jdk1.6: “File_1_ref?r?ns.p”) this is slightly better: at least the full file extension is read: .pdf.

      4. Using jdk1.7.0 ea, java.util.zip, java.nio.Charset, encoding both ZipFile and ZipEntry by ISO-8859-1:
      Following code is evaluating the browsed zip file:
      Charset charsetLatin1 = Charset.forName("ISO-8859-1");
      ZipFile zipFile = new ZipFile([myFilePath], charsetLatin1);

      for (java.util.Enumeration e = zipFile.entries(); e.hasMoreElements();) {
          ZipEntry ZipEntry zipentry = (ZipEntry) e.nextElement();
          String entrynameEncodeUnsafeAscii = java.net.URLEncoder.encode(zipentry.getName(), "ISO- 8859-1");
          String entryname = entrynameEncodeUnsafeAscii;

      ETC...

      This time, debugging the file names returns: “entryname = File_1_ref%84r%84ns.pdf”, and “entryname = File_2_d%86v%84l%94pment.pdf”, respectively. At least now, the encoding engine may identify difference between å, ä and ö. In step 1-3 those signs have been translated into a “?” sign.

      5. Using jdk1.7.0 ea, java.util.zip, java.nioCharset, encoding both ZipFile and ZipEntry and then decoding ZipEntry by ISO-8859-1:
      Following code is evaluating the browsed zip file:
      Charset charsetLatin1 = Charset.forName("ISO-8859-1");
      ZipFile zipFile = new ZipFile([myFilePath], charsetLatin1);

      for (java.util.Enumeration e = zipFile.entries(); e.hasMoreElements();) {
          ZipEntry ZipEntry zipentry = (ZipEntry) e.nextElement();
          String entrynameEncodeUnsafeAscii = java.net.URLEncoder.encode(zipentry.getName(), "ISO- 8859-1");
          String entryname = java.net.URLDecoder.decode(entrynameEncodeUnsafeAscii, "ISO-8859-1");

      ETC...
      And then we’re back to entryname = “File_1_ref?r?ns.pdf”.

      //EOF

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      Please follow the 5 different approaches to reading the file name of a zipped file, described in the previous section.

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      Expected to get correct file names (with å, ä and ö) at unzipping.
      ACTUAL -
      å, ä and ö replaced by either "?" or "%84", "%86" etc.

      ERROR MESSAGES/STACK TRACES THAT OCCUR :
      N/A for 4 of the approaches, but for the second one (again, see problem description):

      java.lang.IllegalArgumentException: MALFORMED[1]
      at java.util.zip.ZipCoder.toString(ZipCoder.java:53)
      at java.util.zip.ZipFile.getZipEntry(ZipFile.java:500)
      at java.util.zip.ZipFile.access$800(ZipFile.java:53)
      at java.util.zip.ZipFile$2.nextElement(ZipFile.java:482)
      at java.util.zip.ZipFile$2.nextElement(ZipFile.java:452)
      at se.aklagarmyndigheten.alba.library.contenttransfer.importcontent.SipFile.validateZipEntries(SipFile.java:278)
      at

      REPRODUCIBILITY :
      This bug can be reproduced always.

      ---------- BEGIN SOURCE ----------
      Out of 5 different approaches (please see description section for source code for all of them), I'm only displaying the one that caused an IllegalArgumentException to be thrown, here:

      Charset charsetISO = Charset.forName("UTF-8");
      ZipFile zipFile = new ZipFile([myFilePath], charsetISO);

      for (java.util.Enumeration e = zipFile.entries(); e.hasMoreElements();) {
          ZipEntry ZipEntry zipentry = (ZipEntry) e.nextElement();
          String entryname = zipentry.getName();

      ...ETC...
      ---------- END SOURCE ----------

      CUSTOMER SUBMITTED WORKAROUND :
      Sorry, no workaround.

      SUPPORT :
      YES

            sherman Xueming Shen
            webbuggrp Webbug Group
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: