Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-6611830

UUID thread-safety and performance improvements

XMLWordPrintable

    • b136
    • generic
    • generic
    • Not verified

      Josh Bloch writes:

      Google's Daniel Aioanei <###@###.###> points out that the java.util.UUID, while supposedly immutable, is not thread-safe, due errors in caching the variant and hashCode. Let's first take a look at the code for variant:

       87 private transient int variant = -1;

      277 public int variant() {
      278 if (variant < 0) {
      279 // This field is composed of a varying number of bits
      280 if ((leastSigBits >>> 63) == 0) {
      281 variant = 0;
      282 } else if ((leastSigBits >>> 62) == 2) {
      283 variant = 2;
      284 } else {
      285 variant = (int)(leastSigBits >>> 61);
      286 }
      287 }
      288 return variant;
      289 }


      At first glance, this code appears to be equivalent to the hashCode caching code for java.lang.String , which is known to be correct. But there's one key difference: this code uses -1, rather than 0, as a marker for the unintialized value. If a UUID is passed from one thread to another with no synchronization, it is possible for the latter thread to see the variant field in its truly uninitialized (0) state, which will cause it to falsely assume that the variant is 0.

      Looking at the computation, it's way too cheap to justify lazy computation. and it can be made even cheaper. Here's a branch-free computation:

          private transient final int variant =
                    (int) ((leastSigBits >>> (64 - (leastSigBits >>> 62)))
                            & (leastSigBits >> 63));

      Unfortunately, serialization has serious deficiencies when it comes to transient final fields: you must set them in the readObject method, and setting a final field is, um, tricky. You can do it using setAccessble, but that can interact badly with some security managers. Alternatively, you can use sun.misc.Unsafe, as I believe is done for System.in and System.out. An alternative approach is not to cache the variant but to recalculate it every time. This is simple, and perhaps so cheap that it's worth considering:

          public int variant() {
              return (int) ((leastSigBits >>> (64 - (leastSigBits >>> 62)))
                            & (leastSigBits >> 63));
          }

      Caching for version(), timestamp(), clockSequence(), and node() are broken in similar manner, and similar comments apply.



      On to hashCode:

      107 private transient int hashCode = -1;

      416 public int hashCode() {
      417 if (hashCode == -1) {
      418 hashCode = (int)((mostSigBits >> 32) ^
      419 mostSigBits ^
      420 (leastSigBits >> 32) ^
      421 leastSigBits);
      422 }
      423 return hashCode;
      424 }

      This code is broken in two ways, the first is as explained above for variant. The second is that this code reads the hashCode field twice. There is no guarantee that the second value with be the same as the first. The first read could return 0 (totally uninitialized), and the second could return -1 "uninitialized," in which case the method would return -1 incorrectly. In fact, in the total absence of locking, the second read could return an "earlier" value: the first could return the properly computed value, and the second could (erroneously) return 0 or -1. Again, the computation is far too cheap to justify lazy initialization, so the obvious fixes are either to make the hashCode field constant and initialize it at UUID creation time or (if deserialization proves too painful) to simply recalculate it each time:

          public int hashCode() {
              return (int) ( (mostSigBits >> 32) ^ mostSigBits
                             ^ (leastSigBits >> 32) ^ leastSigBits);
          }

      The current equals computation, while correct, is needlessly complex:

          public boolean equals(Object obj) {
          if (!(obj instanceof UUID))
              return false;
              if (((UUID)obj).variant() != this.variant ())
                  return false;
              UUID id = (UUID)obj;
              return (mostSigBits == id.mostSigBits &&
                      leastSigBits == id.leastSigBits);
          }

      The check on variant() is extraneous, and highly unlikely to improve performance. Better is:

          public boolean equals(Object obj) {
              if (!(obj instanceof UUID))
                  return false;
              UUID id = (UUID)obj;
              return mostSigBits == id.mostSigBits &&
                     leastSigBits == id.leastSigBits;
          }

      Perhaps better still is:

          public boolean equals(Object obj) {
              if (!(obj instanceof UUID))
                  return false;
              UUID id = (UUID)obj;
              return mostSigBits == id.mostSigBits &
                     leastSigBits == id.leastSigBits;
          }

      The latter is branch-free, and the branch in the former is unpredictable. Admittely the former is more idiomatic, but it may be time to change the idiom.

      To summarize, the caching for version, variant, timestamp, sequence, node, and hashCode is broken. The two obvious choices to fix it are make these fields final (set upon object creation), and to eliminate them entirely and recalculate the values each time they're requested. The former is likely to have marginally better time performance, and the latter better space performance. The former will require some effort due to deficiencies in the serializatin mechanism. Finally, the equals method is unduly complex, and slower than necessary.

            mduigou Mike Duigou
            martin Martin Buchholz
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: