Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8209576

java.nio.file.Files.writeString writes garbled UTF-16 instead of UTF-8

XMLWordPrintable

    • b08
    • x86_64
    • linux_ubuntu

        ADDITIONAL SYSTEM INFORMATION :
        openjdk version "11-ea" 2018-09-25
        OpenJDK Runtime Environment 18.9 (build 11-ea+25)
        OpenJDK 64-Bit Server VM 18.9 (build 11-ea+25, mixed mode)

        A DESCRIPTION OF THE PROBLEM :
        Certain Unicode characters (see the test) may cause a bug in charset conversion
        which may result in a potential data loss. The attached test program outputs:

        ---- OUTPUT FILE: Files.write.txt
        Files.write "Hello" (NOTE: old method)
        Files.readAllBytes: "Hello"€ (length = 7)
        Files.readAllBytes: [-30, -128, -100, 72, 101, 108, 108, 111, -30, -128, -99]

        ---- OUTPUT FILE: Files.writeString-ASCII.txt
        Files.writeString ASCII
        Files.readString: ASCII (length = 5)
        Files.readAllBytes: [65, 83, 67, 73, 73]

        ---- OUTPUT FILE: Files.writeString-Unicode.txt
        Files.writeString "Hello"€
        Files.readString: ..H.e.l.l.o.. (length = 14)
        Files.readAllBytes: [28, 32, 72, 0, 101, 0, 108, 0, 108, 0, 111, 0, 29, 32]

        EXPECTED VERSUS ACTUAL BEHAVIOR :
        EXPECTED -
        Text written in UTF-8 as documented
        ACTUAL -
        Text written in UTF-16 (or something)

        ---------- BEGIN SOURCE ----------
        import java.nio.file.*;
        import java.util.Arrays;

        public class Test {

        public static void main(String... args) throws Exception {
        final String text = "\u201CHello\u201D"; // <-- quotation char causes problem
        //final String text = "\u017Cółw"; // some other Unicode chars don't cause this problem

        oldWrite(Path.of("Files.write.txt"), text); // OK
        newWrite(Path.of("Files.writeString-ASCII.txt"), "ASCII"); // OK
        newWrite(Path.of("Files.writeString-Unicode.txt"), text); // <-- BUG
        }

        static void oldWrite(Path output, String text) throws Exception {
        System.out.println();
        System.out.println("---- OUTPUT FILE: " + output);

        System.out.println("Files.write " + text);
        Files.write(output, text.getBytes("UTF-8"));
        String actual = new String(Files.readAllBytes(output), "UTF-8");
        System.out.println("Files.readAllBytes: " + actual + " (length = " + actual.length() + ")");
        System.out.println("Files.readAllBytes: " + Arrays.toString(Files.readAllBytes(output)));
        }

        static void newWrite(Path output, String text) throws Exception {
        System.out.println();
        System.out.println("---- OUTPUT FILE: " + output);

        System.out.println("Files.writeString " + text);
        Files.writeString(output, text); // <-- writes UTF-16 instead of UTF-8
        String actual = Files.readString(output);
        System.out.println("Files.readString: " + actual + " (length = " + actual.length() + ")");
        System.out.println("Files.readAllBytes: " + Arrays.toString(Files.readAllBytes(output)));
        }

        }
        ---------- END SOURCE ----------

        FREQUENCY : always


              joehw Joe Wang
              webbuggrp Webbug Group
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: