-
Bug
-
Resolution: Fixed
-
P4
-
11, 12
-
b15
-
x86_64
-
linux_ubuntu
ADDITIONAL SYSTEM INFORMATION :
openjdk version "12-ea" 2019-03-19
OpenJDK Runtime Environment (build 12-ea+26)
OpenJDK 64-Bit Server VM (build 12-ea+26, mixed mode, sharing)
A DESCRIPTION OF THE PROBLEM :
Emoji sequences like ð¨ð¾ or ð¨âð©âð¦ are not clustered using the regular expression matcher \b{g} (A Unicode extended grapheme cluster boundary).
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
String stringmoji = new StringBuilder().appendCodePoint(0x1f468).appendCodePoint(0x1f3fe).appendCodePoint(0x1f468).appendCodePoint(0x200d).appendCodePoint(0x1f469).appendCodePoint(0x200d).appendCodePoint(0x1f466).toString();
Pattern pattern = Pattern.compile("\\b{g}");
Function<String, String> toCodePointNumber = (cp) -> cp.codePoints().mapToObj(c -> String.format("%04x", c)).collect(Collectors.joining(",")); System.out.println(pattern.splitAsStream(stringmoji).map(toCodePointNumber).collect(Collectors.joining("][","[","]")));
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
[1f468,1f3fe][1f468,200d,1f469,200d,1f466]
ACTUAL -
[1f468][1f3fe][1f468,200d][1f469,200d][1f466]
FREQUENCY : always
openjdk version "12-ea" 2019-03-19
OpenJDK Runtime Environment (build 12-ea+26)
OpenJDK 64-Bit Server VM (build 12-ea+26, mixed mode, sharing)
A DESCRIPTION OF THE PROBLEM :
Emoji sequences like ð¨ð¾ or ð¨âð©âð¦ are not clustered using the regular expression matcher \b{g} (A Unicode extended grapheme cluster boundary).
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
String stringmoji = new StringBuilder().appendCodePoint(0x1f468).appendCodePoint(0x1f3fe).appendCodePoint(0x1f468).appendCodePoint(0x200d).appendCodePoint(0x1f469).appendCodePoint(0x200d).appendCodePoint(0x1f466).toString();
Pattern pattern = Pattern.compile("\\b{g}");
Function<String, String> toCodePointNumber = (cp) -> cp.codePoints().mapToObj(c -> String.format("%04x", c)).collect(Collectors.joining(",")); System.out.println(pattern.splitAsStream(stringmoji).map(toCodePointNumber).collect(Collectors.joining("][","[","]")));
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
[1f468,1f3fe][1f468,200d,1f469,200d,1f466]
ACTUAL -
[1f468][1f3fe][1f468,200d][1f469,200d][1f466]
FREQUENCY : always