Details

    • Sub-task
    • Resolution: Fixed
    • P4
    • 24
    • 24
    • hotspot
    • None
    • b14

    Description

      The modified UTF-8 format used by the VM can require up to six bytes to represent one unicode character, but six byte characters are stored as UTF16 surrogate pairs. Hence the most bytes per character is 3, and so the maximum length is 3*Integer.MAX_VALUE. With compact strings this reduces to 2*Integer.MAX_VALUE. The low-level UTF8/UNICODE API should define UTF8 lengths as size_t to accommodate all possible representations. Higher-level API's can still use int if they know the strings (eg symbols) are sufficiently constrained in length.

      Attachments

        Issue Links

          Activity

            People

              dholmes David Holmes
              dholmes David Holmes
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: