-
Enhancement
-
Resolution: Fixed
-
P3
-
None
-
b20
ZipFile.getEntry does optimizations to check for directory entries by adding a '/' to the encoded byte array. JDK-8242959 improved on this optimization, but also raised the question whether the optimization is always valid in all charsets.
E.g., UTF-16 would encode '/' (2F) as either 2F 00 or 00 2F, which means the hash code would differ and a directory "foo/" potentially not be found when looking up "foo". Further complications arise when/if the directory name ends with a code point that might be encoded so that the final byte is 2F, e.g. \u012F.
We should consider only doing the low-level optimization when the charset encoding used is known to be ASCII compatible in the sense that 2F will be encoded as single-byte 2F. Since more or less all jar files are assumed to be UTF-8 - which is compatible in this sense - this should have little effect on performance.
E.g., UTF-16 would encode '/' (2F) as either 2F 00 or 00 2F, which means the hash code would differ and a directory "foo/" potentially not be found when looking up "foo". Further complications arise when/if the directory name ends with a code point that might be encoded so that the final byte is 2F, e.g. \u012F.
We should consider only doing the low-level optimization when the charset encoding used is known to be ASCII compatible in the sense that 2F will be encoded as single-byte 2F. Since more or less all jar files are assumed to be UTF-8 - which is compatible in this sense - this should have little effect on performance.