Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8207760

SAXException: Invalid UTF-16 surrogate detected: d83c ?

    XMLWordPrintable

Details

    Backports

      Description

        A DESCRIPTION OF THE PROBLEM :
        When being processed, XML stream is split by chunks of 1024 bytes
        If a multi-char symbol (e.g. emoji) is on the edge between two chunks then the first chunk is ended with the first char of the symbol and the second chunk is started with the second char of the symbol.
        In the given example we have a "fallen leaf" Unicode symbol (https://www.compart.com/en/unicode/U+1F342). In the UTF-16 representation it consists of two chars - 0xD83C and 0xDF42. When the second char is carried to the next chunk the first char 0xD83C is recognized as a single invalid character


        ---------- BEGIN SOURCE ----------
        https://github.com/dkBrazz/reproduce-jdk-xslt-bug
        ---------- END SOURCE ----------

        FREQUENCY : always


        Attachments

          Issue Links

            Activity

              People

                joehw Joe Wang
                webbuggrp Webbug Group
                Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                  Created:
                  Updated:
                  Resolved: