-
Bug
-
Resolution: Fixed
-
P3
-
11
-
b08
-
x86_64
-
linux_ubuntu
Issue | Fix Version | Assignee | Priority | Status | Resolution | Resolved In Build |
---|---|---|---|---|---|---|
JDK-8211839 | 11.0.2 | Deepak Kejriwal | P3 | Resolved | Fixed | b02 |
ADDITIONAL SYSTEM INFORMATION :
openjdk version "11-ea" 2018-09-25
OpenJDK Runtime Environment 18.9 (build 11-ea+25)
OpenJDK 64-Bit Server VM 18.9 (build 11-ea+25, mixed mode)
A DESCRIPTION OF THE PROBLEM :
Certain Unicode characters (see the test) may cause a bug in charset conversion
which may result in a potential data loss. The attached test program outputs:
---- OUTPUT FILE: Files.write.txt
Files.write "Hello" (NOTE: old method)
Files.readAllBytes: "Hello" (length = 7)
Files.readAllBytes: [-30, -128, -100, 72, 101, 108, 108, 111, -30, -128, -99]
---- OUTPUT FILE: Files.writeString-ASCII.txt
Files.writeString ASCII
Files.readString: ASCII (length = 5)
Files.readAllBytes: [65, 83, 67, 73, 73]
---- OUTPUT FILE: Files.writeString-Unicode.txt
Files.writeString "Hello"
Files.readString: ..H.e.l.l.o.. (length = 14)
Files.readAllBytes: [28, 32, 72, 0, 101, 0, 108, 0, 108, 0, 111, 0, 29, 32]
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Text written in UTF-8 as documented
ACTUAL -
Text written in UTF-16 (or something)
---------- BEGIN SOURCE ----------
import java.nio.file.*;
import java.util.Arrays;
public class Test {
public static void main(String... args) throws Exception {
final String text = "\u201CHello\u201D"; // <-- quotation char causes problem
//final String text = "\u017CóÅw"; // some other Unicode chars don't cause this problem
oldWrite(Path.of("Files.write.txt"), text); // OK
newWrite(Path.of("Files.writeString-ASCII.txt"), "ASCII"); // OK
newWrite(Path.of("Files.writeString-Unicode.txt"), text); // <-- BUG
}
static void oldWrite(Path output, String text) throws Exception {
System.out.println();
System.out.println("---- OUTPUT FILE: " + output);
System.out.println("Files.write " + text);
Files.write(output, text.getBytes("UTF-8"));
String actual = new String(Files.readAllBytes(output), "UTF-8");
System.out.println("Files.readAllBytes: " + actual + " (length = " + actual.length() + ")");
System.out.println("Files.readAllBytes: " + Arrays.toString(Files.readAllBytes(output)));
}
static void newWrite(Path output, String text) throws Exception {
System.out.println();
System.out.println("---- OUTPUT FILE: " + output);
System.out.println("Files.writeString " + text);
Files.writeString(output, text); // <-- writes UTF-16 instead of UTF-8
String actual = Files.readString(output);
System.out.println("Files.readString: " + actual + " (length = " + actual.length() + ")");
System.out.println("Files.readAllBytes: " + Arrays.toString(Files.readAllBytes(output)));
}
}
---------- END SOURCE ----------
FREQUENCY : always
openjdk version "11-ea" 2018-09-25
OpenJDK Runtime Environment 18.9 (build 11-ea+25)
OpenJDK 64-Bit Server VM 18.9 (build 11-ea+25, mixed mode)
A DESCRIPTION OF THE PROBLEM :
Certain Unicode characters (see the test) may cause a bug in charset conversion
which may result in a potential data loss. The attached test program outputs:
---- OUTPUT FILE: Files.write.txt
Files.write "Hello" (NOTE: old method)
Files.readAllBytes: "Hello" (length = 7)
Files.readAllBytes: [-30, -128, -100, 72, 101, 108, 108, 111, -30, -128, -99]
---- OUTPUT FILE: Files.writeString-ASCII.txt
Files.writeString ASCII
Files.readString: ASCII (length = 5)
Files.readAllBytes: [65, 83, 67, 73, 73]
---- OUTPUT FILE: Files.writeString-Unicode.txt
Files.writeString "Hello"
Files.readString: ..H.e.l.l.o.. (length = 14)
Files.readAllBytes: [28, 32, 72, 0, 101, 0, 108, 0, 108, 0, 111, 0, 29, 32]
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Text written in UTF-8 as documented
ACTUAL -
Text written in UTF-16 (or something)
---------- BEGIN SOURCE ----------
import java.nio.file.*;
import java.util.Arrays;
public class Test {
public static void main(String... args) throws Exception {
final String text = "\u201CHello\u201D"; // <-- quotation char causes problem
//final String text = "\u017CóÅw"; // some other Unicode chars don't cause this problem
oldWrite(Path.of("Files.write.txt"), text); // OK
newWrite(Path.of("Files.writeString-ASCII.txt"), "ASCII"); // OK
newWrite(Path.of("Files.writeString-Unicode.txt"), text); // <-- BUG
}
static void oldWrite(Path output, String text) throws Exception {
System.out.println();
System.out.println("---- OUTPUT FILE: " + output);
System.out.println("Files.write " + text);
Files.write(output, text.getBytes("UTF-8"));
String actual = new String(Files.readAllBytes(output), "UTF-8");
System.out.println("Files.readAllBytes: " + actual + " (length = " + actual.length() + ")");
System.out.println("Files.readAllBytes: " + Arrays.toString(Files.readAllBytes(output)));
}
static void newWrite(Path output, String text) throws Exception {
System.out.println();
System.out.println("---- OUTPUT FILE: " + output);
System.out.println("Files.writeString " + text);
Files.writeString(output, text); // <-- writes UTF-16 instead of UTF-8
String actual = Files.readString(output);
System.out.println("Files.readString: " + actual + " (length = " + actual.length() + ")");
System.out.println("Files.readAllBytes: " + Arrays.toString(Files.readAllBytes(output)));
}
}
---------- END SOURCE ----------
FREQUENCY : always
- backported by
-
JDK-8211839 java.nio.file.Files.writeString writes garbled UTF-16 instead of UTF-8
- Resolved
- relates to
-
JDK-8211773 JDK 11 GA fails JCK11 test: UTF-8/UTF-16 String data conversion problem
- Closed