Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8320570

NegativeArraySizeException decoding >1G UTF8 bytes with non-ascii characters

    XMLWordPrintable

Details

    • b27
    • generic
    • generic
    • Verified

    Backports

      Description

        ADDITIONAL SYSTEM INFORMATION :
        Generic. reproduced in both Linux and MacOS (M1)

        A DESCRIPTION OF THE PROBLEM :
        A regression is found in Java9+ creating String instance from UTF8 bytes, a side effect of string compactation https://openjdk.org/jeps/254 that changed the decoding logic. Specifically, when constructing a string from bytes:

        ```
        String str = new String(largeBytes, StandardCharsets.UTF_8);
        ```

        if the size of largeBytes is greater than 2^30 (>1 GB) but smaller than INT_MAX (2 GB), it fails on Java9+ (including 11, 17, 21, though the stack trace is slightly different, see below), regardless of jvm heap size. In Java8, it succeeded when jvm heap size is set to be sufficient.

        REGRESSION : Last worked in version 8u391

        STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
        // largeBytes is a byte array of size ~1.2 GB and contains encoded non-ascii character
        String str = new String(largeBytes, StandardCharsets.UTF_8);


        EXPECTED VERSUS ACTUAL BEHAVIOR :
        EXPECTED -
        The string successfully constructed, if run with sufficient jvm heap size (java -Xms5G -Xmx8G)
        ACTUAL -

        Java8:
        ```
        $ java -Xms5G -Xmx8G org/example/Main
        (succeeded)
        ```

        Java11 (regardless of heap size):
        ```
        $ java org/example/Main
        Exception in thread "main" java.lang.NegativeArraySizeException: -1894967266
        at java.base/java.lang.StringCoding.decodeUTF8_0(StringCoding.java:777)
        at java.base/java.lang.StringCoding.decodeUTF8(StringCoding.java:734)
        at java.base/java.lang.StringCoding.decode(StringCoding.java:257)
        at java.base/java.lang.String.<init>(String.java:507)
        at java.base/java.lang.String.<init>(String.java:561)
        at org.example.Main.main(Main.java:28)
        ```

        Java17 (regardless of heap size):
        ```
        $ java org/example/Main
        Exception in thread "main" java.lang.NegativeArraySizeException: -1894967266
        at java.base/java.lang.String.<init>(String.java:568)
        at java.base/java.lang.String.<init>(String.java:1387)
        at org.example.Main.main(Main.java:28)
        ```

        Java21 (regardless of heap size):
        ```
        $ java org/example/Main
        Exception in thread "main" java.lang.NegativeArraySizeException: -1894967266
        at java.base/java.lang.String.<init>(String.java:577)
        at java.base/java.lang.String.<init>(String.java:1425)
        at org.example.Main.main(Main.java:28)
        ```

        Java8, default heap size:
        ```
        $ java org/example/Main
        Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at java.lang.StringCoding.decode(StringCoding.java:215)
        at java.lang.String.<init>(String.java:463)
        at java.lang.String.<init>(String.java:515)
        at org.example.Main.main(Main.java:28)
        ```

        ---------- BEGIN SOURCE ----------
        https://gist.github.com/Abacn/e8fda767f53e723db6d71f21f4db2187
        ---------- END SOURCE ----------

        CUSTOMER SUBMITTED WORKAROUND :
        Probably work: split the byte array and construct two string, then add them up

        FREQUENCY : always


        Attachments

          Issue Links

            Activity

              People

                jlaskey Jim Laskey
                webbuggrp Webbug Group
                Votes:
                0 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                  Created:
                  Updated:
                  Resolved: