- 
    Bug 
- 
    Resolution: Fixed
- 
     P4 P4
- 
    5.0
- 
        b57
- 
        generic
- 
        generic
                    java/util/zip/ZipOutputStream.java has an implementation of UTF8 encoding
that does not take into account surrogates:
private static byte[] getUTF8Bytes(String s) {
char[] c = s.toCharArray();
int len = c.length;
// Count the number of encoded bytes...
int count = 0;
for (int i = 0; i < len; i++) {
int ch = c[i];
if (ch <= 0x7f) {
count++;
} else if (ch <= 0x7ff) {
count += 2;
} else {
count += 3;
}
}
// Now return the encoded bytes...
byte[] b = new byte[count];
int off = 0;
for (int i = 0; i < len; i++) {
int ch = c[i];
if (ch <= 0x7f) {
b[off++] = (byte)ch;
} else if (ch <= 0x7ff) {
b[off++] = (byte)((ch >> 6) | 0xc0);
b[off++] = (byte)((ch & 0x3f) | 0x80);
} else {
b[off++] = (byte)((ch >> 12) | 0xe0);
b[off++] = (byte)(((ch >> 6) & 0x3f) | 0x80);
b[off++] = (byte)((ch & 0x3f) | 0x80);
}
}
return b;
}
-----------------------------------------------------------
Also, Norbert Lindenberg noted:
I did notice another thing that looks fishy:
src/share/native/java/util/zip/ZipFile.c has calls to the JNI routines
GetStringUTFLength and GetStringUTFRegion, apparently also to handle
file names. These are probably wrong, because JNI uses modified UTF-8
and zip/jar files should use standard UTF-8.
            
that does not take into account surrogates:
private static byte[] getUTF8Bytes(String s) {
char[] c = s.toCharArray();
int len = c.length;
// Count the number of encoded bytes...
int count = 0;
for (int i = 0; i < len; i++) {
int ch = c[i];
if (ch <= 0x7f) {
count++;
} else if (ch <= 0x7ff) {
count += 2;
} else {
count += 3;
}
}
// Now return the encoded bytes...
byte[] b = new byte[count];
int off = 0;
for (int i = 0; i < len; i++) {
int ch = c[i];
if (ch <= 0x7f) {
b[off++] = (byte)ch;
} else if (ch <= 0x7ff) {
b[off++] = (byte)((ch >> 6) | 0xc0);
b[off++] = (byte)((ch & 0x3f) | 0x80);
} else {
b[off++] = (byte)((ch >> 12) | 0xe0);
b[off++] = (byte)(((ch >> 6) & 0x3f) | 0x80);
b[off++] = (byte)((ch & 0x3f) | 0x80);
}
}
return b;
}
-----------------------------------------------------------
Also, Norbert Lindenberg noted:
I did notice another thing that looks fishy:
src/share/native/java/util/zip/ZipFile.c has calls to the JNI routines
GetStringUTFLength and GetStringUTFRegion, apparently also to handle
file names. These are probably wrong, because JNI uses modified UTF-8
and zip/jar files should use standard UTF-8.
- relates to
- 
                    JDK-4244499 ZipEntry() does not convert filenames from Unicode to platform -           
- Resolved
 
-         
- 
                    JDK-5030283 Incorrect implementation of UTF-8 in zip package -           
- Resolved
 
-