Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-4407426

UTF-8 encoding suggest little-endian bit-order (big-endian is correct)

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: P3 P3
    • 1.4.0
    • 1.3.0
    • docs
    • beta2
    • generic
    • generic



      Name: ssT124754 Date: 01/23/2001


      java version "1.3.0"
      Java(TM) 2 Runtime Environment, Standard Edition (build 1.3.0-C)
      Java HotSpot(TM) Client VM (build 1.3.0-C, mixed mode)


      In

      http://java.sun.com/j2se/1.3/docs/api/java/io/DataInputStream.html

      you explain how you UTF-8-encode UTF-16 characters. You say, for example:

      "All characters in the range '\u0001' to '\u007F' are represented by a single
      byte: |0|bits 0-7|"

      You should say: "|0|bits 7-0|" or "|0|bits 7..0|", because some people could
      think that the most significant UTF-16-bit is stored in the least significant
      UTF-8-bit. Likewise, you should say

      The null character '\u0000' and characters in the range '\u0080' to '\u07FF' are
      represented by a pair of bytes: |1|1|0|bits 10..6| , |1|0|bits 5..0|

      and so on.
      (Review ID: 109528)
      ======================================================================

            shommel Scott Hommel (Inactive)
            ssultanasunw Shaheen Sultana (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: