Details
-
Bug
-
Resolution: Fixed
-
P4
-
8, 9, 10, 11
-
b12
-
x86_64
-
windows_10
Backports
Issue | Fix Version | Assignee | Priority | Status | Resolution | Resolved In Build |
---|---|---|---|---|---|---|
JDK-8233391 | 11.0.7-oracle | Kiran Sidhartha Ravikumar | P4 | Resolved | Fixed | b01 |
JDK-8221843 | 11.0.4 | Joe Wang | P4 | Resolved | Fixed | b01 |
JDK-8221767 | 11.0.3 | Christoph Langer | P4 | Closed | Won't Fix | |
JDK-8221617 | openjdk8u222 | Joe Wang | P4 | Resolved | Fixed | b01 |
JDK-8233400 | 8u251 | Kiran Sidhartha Ravikumar | P4 | Resolved | Fixed | b01 |
JDK-8239635 | emb-8u251 | Aleksej Efimov | P4 | Resolved | Fixed | team |
Description
When being processed, XML stream is split by chunks of 1024 bytes
If a multi-char symbol (e.g. emoji) is on the edge between two chunks then the first chunk is ended with the first char of the symbol and the second chunk is started with the second char of the symbol.
In the given example we have a "fallen leaf" Unicode symbol (https://www.compart.com/en/unicode/U+1F342). In the UTF-16 representation it consists of two chars - 0xD83C and 0xDF42. When the second char is carried to the next chunk the first char 0xD83C is recognized as a single invalid character
---------- BEGIN SOURCE ----------
https://github.com/dkBrazz/reproduce-jdk-xslt-bug
---------- END SOURCE ----------
FREQUENCY : always
Attachments
Issue Links
- backported by
-
JDK-8221617 SAXException: Invalid UTF-16 surrogate detected: d83c ?
- Resolved
-
JDK-8221843 SAXException: Invalid UTF-16 surrogate detected: d83c ?
- Resolved
-
JDK-8233391 SAXException: Invalid UTF-16 surrogate detected: d83c ?
- Resolved
-
JDK-8233400 SAXException: Invalid UTF-16 surrogate detected: d83c ?
- Resolved
-
JDK-8239635 SAXException: Invalid UTF-16 surrogate detected: d83c ?
- Resolved
-
JDK-8221767 SAXException: Invalid UTF-16 surrogate detected: d83c ?
- Closed
- relates to
-
JDK-8215543 New line and indentation is added to the content of CDATA sections within XML
- Resolved