Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8320570

NegativeArraySizeException decoding >1G UTF8 bytes with non-ascii characters

XMLWordPrintable

    • b27
    • generic
    • generic
    • Verified

        ADDITIONAL SYSTEM INFORMATION :
        Generic. reproduced in both Linux and MacOS (M1)

        A DESCRIPTION OF THE PROBLEM :
        A regression is found in Java9+ creating String instance from UTF8 bytes, a side effect of string compactation https://openjdk.org/jeps/254 that changed the decoding logic. Specifically, when constructing a string from bytes:

        ```
        String str = new String(largeBytes, StandardCharsets.UTF_8);
        ```

        if the size of largeBytes is greater than 2^30 (>1 GB) but smaller than INT_MAX (2 GB), it fails on Java9+ (including 11, 17, 21, though the stack trace is slightly different, see below), regardless of jvm heap size. In Java8, it succeeded when jvm heap size is set to be sufficient.

        REGRESSION : Last worked in version 8u391

        STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
        // largeBytes is a byte array of size ~1.2 GB and contains encoded non-ascii character
        String str = new String(largeBytes, StandardCharsets.UTF_8);


        EXPECTED VERSUS ACTUAL BEHAVIOR :
        EXPECTED -
        The string successfully constructed, if run with sufficient jvm heap size (java -Xms5G -Xmx8G)
        ACTUAL -

        Java8:
        ```
        $ java -Xms5G -Xmx8G org/example/Main
        (succeeded)
        ```

        Java11 (regardless of heap size):
        ```
        $ java org/example/Main
        Exception in thread "main" java.lang.NegativeArraySizeException: -1894967266
        at java.base/java.lang.StringCoding.decodeUTF8_0(StringCoding.java:777)
        at java.base/java.lang.StringCoding.decodeUTF8(StringCoding.java:734)
        at java.base/java.lang.StringCoding.decode(StringCoding.java:257)
        at java.base/java.lang.String.<init>(String.java:507)
        at java.base/java.lang.String.<init>(String.java:561)
        at org.example.Main.main(Main.java:28)
        ```

        Java17 (regardless of heap size):
        ```
        $ java org/example/Main
        Exception in thread "main" java.lang.NegativeArraySizeException: -1894967266
        at java.base/java.lang.String.<init>(String.java:568)
        at java.base/java.lang.String.<init>(String.java:1387)
        at org.example.Main.main(Main.java:28)
        ```

        Java21 (regardless of heap size):
        ```
        $ java org/example/Main
        Exception in thread "main" java.lang.NegativeArraySizeException: -1894967266
        at java.base/java.lang.String.<init>(String.java:577)
        at java.base/java.lang.String.<init>(String.java:1425)
        at org.example.Main.main(Main.java:28)
        ```

        Java8, default heap size:
        ```
        $ java org/example/Main
        Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at java.lang.StringCoding.decode(StringCoding.java:215)
        at java.lang.String.<init>(String.java:463)
        at java.lang.String.<init>(String.java:515)
        at org.example.Main.main(Main.java:28)
        ```

        ---------- BEGIN SOURCE ----------
        https://gist.github.com/Abacn/e8fda767f53e723db6d71f21f4db2187
        ---------- END SOURCE ----------

        CUSTOMER SUBMITTED WORKAROUND :
        Probably work: split the byte array and construct two string, then add them up

        FREQUENCY : always


              jlaskey Jim Laskey
              webbuggrp Webbug Group
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated:
                Resolved: