Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-4614120

UTF-8 vmspec not verified by java -Xfuture

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: P4 P4
    • 1.4.2
    • 1.4.0, 1.4.2
    • hotspot
    • mantis
    • generic
    • generic

      Name: gm110360 Date: 12/14/2001


      java version "1.4.0-beta3"
      Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.0-beta3-b84)
      Java HotSpot(TM) Client VM (build 1.4.0-beta3-b84, mixed mode)

      > http://java.sun.com/products/jdk/1.2/compatibility.html

      > Runtime Incompatibilities in Version 1.2

      > In JDK 1.2 software
      > the -Xfuture option enables the strictest possible
      > class-file format checks ...

      If only this were true. Please try the demo below, see that something's
      broken, tell me what it is, and fix it.

      > reject... illegal UTF-8 strings

      I hope I'm right to think you agree that vmspec UTF-8, by "4.4.7 The
      CONSTANT_Utf8_info Structure", is only shortest form UTF-8 except that u0000,
      if present, appears always as x C0 80?

      By that definition, the `java -Xfuture` verification of .class file format
      rejects a lot less than all forms of "illegal UTF-8 strings".

      1) The verification never complains of not-shortest-form UTF-8. (Though it
      does complain of the too-short-form x 00.)

      2) The verification accepts truncated and ill-formed UTF-8 in string values,
      attribute names, and unused entries.

      We care because by design, vmspec UTF-8 defines precisely zero or one ways to
      represent any sequence of chars. By defining more than one sequence of bytes
      as equal to a given sequence of chars, we raise unanswerable questions. Does
      one method override another? Is a field present? Is a constant initialiser
      present?

      Now for the promised quick, rough demo of some of this. Try editing the binary
      A.class after compiling this source:

              class A
                  {
                  final static int theInt = 0x9ABCDEF0;
                  String theString = "ConstantValue";
                  }

              class B
                  {
                  public static void main(String[] strings)
                      {
                      System.out.println(Character.isJavaIdentifierPart('\u00E0'));

                      A a = new A();
                      String st = a.theString;
                      for (int index = 0; index < st.length(); ++index)
                              {
                              char ch = st.charAt(index);
                              System.out.println("x" + Integer.
                                      toHexString(ch).toUpperCase());
                              }
                      }
                  }

      In the binary A.class, confirm you see only one CONSTANT_Utf8_info entry that
      equals "theInt":

              01 00:06 74 68 65 49 6E 74 // theInt

        See also that `java -Xfuture B` accepts the A.class binary.

      Now change the A.class binary. Change the trailing x74 to an xE0. See that
      `java -Xfuture B` explodes, complaining of an "Illegal Field name". So far so
      good.

      Now restore the original A.class binary (most simply, recompile it). Go find
      the one entry of:

              01 00:0D 43 6F 6E 73 74 61 6E 74 56 61 6C 75 65 // ConstantValue

      Change the trailing x65 to an xE0. See that `java -Xfuture B` is happy.

      Conclude that string values and attribute names may contain truncated Utf.

      Repeat, if you like, changing two trailing bytes, to see constant pool Utf may
      contain ill-formed Utf, such as x D0 01 (b10xx:xxxx does not follow b110x:xxxx).

      Repeat, if you like, changing three trailing bytes, to see constant pool Utf
      may contain not-shortest-form Utf, such as x E0 90 81. So may field names, etc.

      Please tell me what's broken and fix it - or unconfuse me!

      Thanks in advance. Pat LaVarre

      > http://developer.java.sun.com/developer/bugParade/
      > +-Xfuture +utf
      > 4 Results Found, Sorted by [lack of] Relevance
      (Review ID: 136117)
      ======================================================================

            wtaosunw Wei Tao (Inactive)
            gmanwanisunw Girish Manwani (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: