• Icon: Sub-task Sub-task
    • Resolution: Fixed
    • Icon: P4 P4
    • 24
    • 24
    • hotspot
    • None
    • b14

      The modified UTF-8 format used by the VM can require up to six bytes to represent one unicode character, but six byte characters are stored as UTF16 surrogate pairs. Hence the most bytes per character is 3, and so the maximum length is 3*Integer.MAX_VALUE. With compact strings this reduces to 2*Integer.MAX_VALUE. The low-level UTF8/UNICODE API should define UTF8 lengths as size_t to accommodate all possible representations. Higher-level API's can still use int if they know the strings (eg symbols) are sufficiently constrained in length.

            dholmes David Holmes
            dholmes David Holmes
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: