Summary
javac will now correctly reject programs that contains character literals containing multi-surrogate characters.
Problem
Certain Unicode characters are not representable by a single char
, and are represented as two char
s (surrogates). In Java, a character literal can only contain one char
value, not two. But javac will accept code like:
char c = '๐';
javac will only use the first char
/surrogate, and ignore the second one, leading to incorrect and surprising behavior.
Solution
javac will produce an error when a character literal contains a Unicode character, that cannot be represented by a single char
value.
Specification
The new behavior is in line with the existing specification. Specifically, JLS 3.10.4 states:
Character literals can only represent UTF-16 code units (ยง3.1), i.e., they are limited to values from \u0000 to \uffff.
I.e. the specification does not allow to use a multi-surrogate character in a char literal.
- csr of
-
JDK-8354908 javac mishandles supplementary character in character literal
-
- Open
-