Examine ZipFile slash optimization for non-ASCII compatible charsets

XMLWordPrintable

    • Type: Enhancement
    • Resolution: Fixed
    • Priority: P3
    • 15
    • Affects Version/s: None
    • Component/s: core-libs

      ZipFile.getEntry does optimizations to check for directory entries by adding a '/' to the encoded byte array. JDK-8242959 improved on this optimization, but also raised the question whether the optimization is always valid in all charsets.

      E.g., UTF-16 would encode '/' (2F) as either 2F 00 or 00 2F, which means the hash code would differ and a directory "foo/" potentially not be found when looking up "foo". Further complications arise when/if the directory name ends with a code point that might be encoded so that the final byte is 2F, e.g. \u012F.

      We should consider only doing the low-level optimization when the charset encoding used is known to be ASCII compatible in the sense that 2F will be encoded as single-byte 2F. Since more or less all jar files are assumed to be UTF-8 - which is compatible in this sense - this should have little effect on performance.

            Assignee:
            Claes Redestad
            Reporter:
            Claes Redestad
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: