-
Bug
-
Resolution: Not an Issue
-
P4
-
None
-
17, 19
-
x86_64
-
windows_10
ADDITIONAL SYSTEM INFORMATION :
Windows 10 / 11
Java 17 / 19
A DESCRIPTION OF THE PROBLEM :
java.exe seems to mangle Combining Diacritical Marks
We observe in our test case that U+0301 Combining Acute Accent is somehow translated to U+00B4 Grave Accent at some point before Java Main() but after C main().
We also tested with python.exe for good measure and cannot reproduce the issue there.
See PowerShell commands below to reproduce the problem:
1. Pass é [101, 769] as command-line argument
2. Find e´ [101, 180] in the argument String value
$NFD = "$([char]101)$([char]769)"
PS C:\tmp> echo $NFD
é
PS C:\tmp> groovy -e "println args; println args*.codePoints()*.toList()" $NFD
[e´]
[[101, 180]]
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
$NFD = "$([char]101)$([char]769)"
PS C:\tmp> echo $NFD
é
PS C:\tmp> groovy -e "println args; println args*.codePoints()*.toList()" $NFD
[e´]
[[101, 180]]
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
When passing é [101, 769] on the command-line, we would like the String value on the Java side to be those exact code points.
ACTUAL -
é [101, 769] changes to e´ [101, 180] which is just completely different code points. It's not even NFC / NFD unicode-equivalent.
---------- BEGIN SOURCE ----------
$NFD = "$([char]101)$([char]769)"
groovy -e "println args; println args*.codePoints()*.toList()" $NFD
---------- END SOURCE ----------
FREQUENCY : always
Windows 10 / 11
Java 17 / 19
A DESCRIPTION OF THE PROBLEM :
java.exe seems to mangle Combining Diacritical Marks
We observe in our test case that U+0301 Combining Acute Accent is somehow translated to U+00B4 Grave Accent at some point before Java Main() but after C main().
We also tested with python.exe for good measure and cannot reproduce the issue there.
See PowerShell commands below to reproduce the problem:
1. Pass é [101, 769] as command-line argument
2. Find e´ [101, 180] in the argument String value
$NFD = "$([char]101)$([char]769)"
PS C:\tmp> echo $NFD
é
PS C:\tmp> groovy -e "println args; println args*.codePoints()*.toList()" $NFD
[e´]
[[101, 180]]
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
$NFD = "$([char]101)$([char]769)"
PS C:\tmp> echo $NFD
é
PS C:\tmp> groovy -e "println args; println args*.codePoints()*.toList()" $NFD
[e´]
[[101, 180]]
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
When passing é [101, 769] on the command-line, we would like the String value on the Java side to be those exact code points.
ACTUAL -
é [101, 769] changes to e´ [101, 180] which is just completely different code points. It's not even NFC / NFD unicode-equivalent.
---------- BEGIN SOURCE ----------
$NFD = "$([char]101)$([char]769)"
groovy -e "println args; println args*.codePoints()*.toList()" $NFD
---------- END SOURCE ----------
FREQUENCY : always
- duplicates
-
JDK-8294885 java.exe mangles command-line arguments with Combining Diacritical Marks
-
- Closed
-