-
Enhancement
-
Resolution: Unresolved
-
P4
-
10
-
x86_64
-
linux_ubuntu
FULL PRODUCT VERSION :
openjdk version "10-ea" 2018-03-20
OpenJDK Runtime Environment 18.3 (build 10-ea+40)
OpenJDK 64-Bit Server VM 18.3 (build 10-ea+40, mixed mode)
java version "10-ea" 2018-03-20
Java(TM) SE Runtime Environment 18.3 (build 10-ea+40)
Java HotSpot(TM) 64-Bit Server VM 18.3 (build 10-ea+40, mixed mode)
ADDITIONAL OS VERSION INFORMATION :
Linux Nelkinda-Blade-Stealth-1 4.10.0-37-generic #41-Ubuntu SMP Fri Oct 6 20:20:37 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
A DESCRIPTION OF THE PROBLEM :
JEP draft: Unicode 10 is about "Upgrade existing platform APIs to support version 10.0.0 of the Unicode Standard." (http://openjdk.java.net/jeps/8182490)
The section "Non-Goals" lists three related Unicode specifications that will not be implemented by that JEP: UTS #10, UTS #46 and UTS #39. It must, therefore, be assumed that UTS #51 will be implemented by this JEP.
Unicode 10.0.0 and UTS #10 add two new situations that require special treatment when reversing Strings, besides surrogates:
• Code points U+FE0E and U+FE0F for requesting text resp. emoji presentation of the preceding character must stay after the character.
• Pairs of code points U+1F1E6 to U1F1FF must not be reordered.
Currently, JDK 10 lacks an implementation of this situation. The implementation is expected to be present at method java.lang.AbstractStringBuilder.reverse().
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
• Create a String which contains two codepoints, an emoji followed by a presentation variation selector. Reverse the String using AbstractStringBuilder.reverse().
• Create a String which contains two codepoints which comprise an emoji flag from regional indicator symbols. Reverse the String using AbstractStringBuilder.reverse()
• Create a String which contains three codepoints which comprise an emoji flag from regional indicator symbols followed by a presentation variant selector. Reverse the String using AbstractStringBuilder.reverse()
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
In all above cases, if the String contains no other characters, the reverse String is expected to be equal to the input String, just as is the case with a String that consists of two UTF-16 characters, a high surrogate and a low surrogate.
In terms of the provided test case, the expected result is that all test cases provided in above unit test pass. Which implies that AbstractStringBuilder.reverse()
• reverses a String (it does)
• undoes swapping of surrogates (it does)
• undoes repositioning of presentation variation selectors (it doesn't)
• undoes swapping of regional indicator symbol letters (it doesn't)
• also behaves correctly for the combination of regional indicator symbols with presentation variation selectors (it doesn't).
ACTUAL -
In all above cases, the code points are reversed although in these special situations they shouldn't be reversed. The sequence of a code point followed by a presentation variation selector code point must stay intact. The sequence of pairs of regional indicator symbols must stay intact. The sequence of a pair of regional indicator symbols followed by a presentation variation selector code point must stay intact.
In terms of the provided test case, the actual result is that neither presentation variation selectors nor regional indicator symbol letters are treated properly, and thus reversing Unicode strings which contain these produces wrong results.
• The presentation variation selector is applied to the wrong character.
• The flag emoji from the regional indicator symbol letters is destroyed (because for example US is replaced with SU).
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
package com.nelkinda.playground.java.test;
import org.junit.jupiter.api.Test;
import static org.junit.jupiter.api.Assertions.assertEquals;
public class SurrogateReverseTest {
@Test
public void testSurrogateReverse() {
String input = "\uD83D\uDDA6";
String expected = input;
String actual = new StringBuilder(input).reverse().toString();
assertEquals(expected, actual);
}
@Test
public void testPresentationSelectorReverse() {
String input = "\u260E\uFE0E\u260F\uFE0F"; // phone black text, phone white emoji
String expected = "\u260F\uFE0F\u260E\uFE0E"; // phone white emoji, phone black text
String actual = new StringBuilder(input).reverse().toString();
assertEquals(expected, actual);
}
@Test
public void testFlagReverseU() {
String input = "Emoji 🇩�🇪� Germany"; // DE
String expected = "ynamreG 🇩�🇪� ijomE"; // DE
String actual = new StringBuilder(input).reverse().toString();
assertEquals(expected, actual);
}
@Test
public void testFlagReverse() {
String input = "Emoji \uD83C\uDDE9\uD83C\uDDEA Germany"; // DE
String expected = "ynamreG \uD83C\uDDE9\uD83C\uDDEA ijomE"; // DE
String actual = new StringBuilder(input).reverse().toString();
assertEquals(expected, actual);
}
@Test
public void testFlagReverseWithPresentationSelector() {
String input = "Emoji \uD83C\uDDE9\uD83C\uDDEA\uFE0E Germany"; // DE
String expected = "ynamreG \uD83C\uDDE9\uD83C\uDDEA\uFE0E ijomE"; // DE
String actual = new StringBuilder(input).reverse().toString();
assertEquals(expected, actual);
}
@Test
public void testTwoFlags() {
String input = "\uD83C\uDDE9\uD83C\uDDEA\uD83C\uDDEC\uD83C\uDDE7"; //DE GB
String expected = "\uD83C\uDDEC\uD83C\uDDE7\uD83C\uDDE9\uD83C\uDDEA"; // DB DE
String actual = new StringBuilder(input).reverse().toString();
assertEquals(expected, actual);
}
}
---------- END SOURCE ----------
openjdk version "10-ea" 2018-03-20
OpenJDK Runtime Environment 18.3 (build 10-ea+40)
OpenJDK 64-Bit Server VM 18.3 (build 10-ea+40, mixed mode)
java version "10-ea" 2018-03-20
Java(TM) SE Runtime Environment 18.3 (build 10-ea+40)
Java HotSpot(TM) 64-Bit Server VM 18.3 (build 10-ea+40, mixed mode)
ADDITIONAL OS VERSION INFORMATION :
Linux Nelkinda-Blade-Stealth-1 4.10.0-37-generic #41-Ubuntu SMP Fri Oct 6 20:20:37 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
A DESCRIPTION OF THE PROBLEM :
JEP draft: Unicode 10 is about "Upgrade existing platform APIs to support version 10.0.0 of the Unicode Standard." (http://openjdk.java.net/jeps/8182490)
The section "Non-Goals" lists three related Unicode specifications that will not be implemented by that JEP: UTS #10, UTS #46 and UTS #39. It must, therefore, be assumed that UTS #51 will be implemented by this JEP.
Unicode 10.0.0 and UTS #10 add two new situations that require special treatment when reversing Strings, besides surrogates:
• Code points U+FE0E and U+FE0F for requesting text resp. emoji presentation of the preceding character must stay after the character.
• Pairs of code points U+1F1E6 to U1F1FF must not be reordered.
Currently, JDK 10 lacks an implementation of this situation. The implementation is expected to be present at method java.lang.AbstractStringBuilder.reverse().
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
• Create a String which contains two codepoints, an emoji followed by a presentation variation selector. Reverse the String using AbstractStringBuilder.reverse().
• Create a String which contains two codepoints which comprise an emoji flag from regional indicator symbols. Reverse the String using AbstractStringBuilder.reverse()
• Create a String which contains three codepoints which comprise an emoji flag from regional indicator symbols followed by a presentation variant selector. Reverse the String using AbstractStringBuilder.reverse()
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
In all above cases, if the String contains no other characters, the reverse String is expected to be equal to the input String, just as is the case with a String that consists of two UTF-16 characters, a high surrogate and a low surrogate.
In terms of the provided test case, the expected result is that all test cases provided in above unit test pass. Which implies that AbstractStringBuilder.reverse()
• reverses a String (it does)
• undoes swapping of surrogates (it does)
• undoes repositioning of presentation variation selectors (it doesn't)
• undoes swapping of regional indicator symbol letters (it doesn't)
• also behaves correctly for the combination of regional indicator symbols with presentation variation selectors (it doesn't).
ACTUAL -
In all above cases, the code points are reversed although in these special situations they shouldn't be reversed. The sequence of a code point followed by a presentation variation selector code point must stay intact. The sequence of pairs of regional indicator symbols must stay intact. The sequence of a pair of regional indicator symbols followed by a presentation variation selector code point must stay intact.
In terms of the provided test case, the actual result is that neither presentation variation selectors nor regional indicator symbol letters are treated properly, and thus reversing Unicode strings which contain these produces wrong results.
• The presentation variation selector is applied to the wrong character.
• The flag emoji from the regional indicator symbol letters is destroyed (because for example US is replaced with SU).
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
package com.nelkinda.playground.java.test;
import org.junit.jupiter.api.Test;
import static org.junit.jupiter.api.Assertions.assertEquals;
public class SurrogateReverseTest {
@Test
public void testSurrogateReverse() {
String input = "\uD83D\uDDA6";
String expected = input;
String actual = new StringBuilder(input).reverse().toString();
assertEquals(expected, actual);
}
@Test
public void testPresentationSelectorReverse() {
String input = "\u260E\uFE0E\u260F\uFE0F"; // phone black text, phone white emoji
String expected = "\u260F\uFE0F\u260E\uFE0E"; // phone white emoji, phone black text
String actual = new StringBuilder(input).reverse().toString();
assertEquals(expected, actual);
}
@Test
public void testFlagReverseU() {
String input = "Emoji 🇩�🇪� Germany"; // DE
String expected = "ynamreG 🇩�🇪� ijomE"; // DE
String actual = new StringBuilder(input).reverse().toString();
assertEquals(expected, actual);
}
@Test
public void testFlagReverse() {
String input = "Emoji \uD83C\uDDE9\uD83C\uDDEA Germany"; // DE
String expected = "ynamreG \uD83C\uDDE9\uD83C\uDDEA ijomE"; // DE
String actual = new StringBuilder(input).reverse().toString();
assertEquals(expected, actual);
}
@Test
public void testFlagReverseWithPresentationSelector() {
String input = "Emoji \uD83C\uDDE9\uD83C\uDDEA\uFE0E Germany"; // DE
String expected = "ynamreG \uD83C\uDDE9\uD83C\uDDEA\uFE0E ijomE"; // DE
String actual = new StringBuilder(input).reverse().toString();
assertEquals(expected, actual);
}
@Test
public void testTwoFlags() {
String input = "\uD83C\uDDE9\uD83C\uDDEA\uD83C\uDDEC\uD83C\uDDE7"; //DE GB
String expected = "\uD83C\uDDEC\uD83C\uDDE7\uD83C\uDDE9\uD83C\uDDEA"; // DB DE
String actual = new StringBuilder(input).reverse().toString();
assertEquals(expected, actual);
}
}
---------- END SOURCE ----------