Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-6530203

Arrays.hashCode(long a[]) calculates bad hash code for 0 and -1 values

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Fix
    • Icon: P4 P4
    • None
    • 6
    • core-libs
    • x86
    • windows_xp

      FULL PRODUCT VERSION :
      java version "1.6.0"
      Java(TM) SE Runtime Environment (build 1.6.0-b105)
      Java HotSpot(TM) Client VM (build 1.6.0-b105, mixed mode, sharing)


      ADDITIONAL OS VERSION INFORMATION :
      Microsoft Windows XP [Version 5.1.2600]

      A DESCRIPTION OF THE PROBLEM :
      Please look at the implementation of Arrays.hashCode(long a[]) method. The basic loop is the following:

              for (long element : a) {
                  int elementHash = (int)(element ^ (element >>> 32));
                  result = 31 * result + elementHash;
              }

      It's obvious that if low and high words in the element are equal, then elementHash will be zero. There is, at least, two very popular integers producing this result: 0 and -1. For both values, the expression "(int)(element^(element>>>32))" returns 0.

      This problem really occurred in one of my applications. By default, long[] arrays are filled by zero, and -1 is used sometimes as usual or signal value. If I create new long array[], calculate hash code, then assign -1 to some elements and calculate hash code again, I see the same result. I think it is a bug: the probability of same hash codes for different arrays is too high and can lead to serious problems in algorithms that often assign -1 to elements of long[] arrays.


      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      Please compile and run the attached test.

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      Different hash codes for the tested arrays.
      ACTUAL -
      Hash code is not changed while any long[] array modifications where 0 element is replaced with -1 or vice versa.

      REPRODUCIBILITY :
      This bug can be reproduced always.

      ---------- BEGIN SOURCE ----------
      import java.util.Arrays;
      public class Test {
          public static void main(String[] args) {
              long[] a = new long[10];
              for (int k = 0; k < a.length; k++)
                  System.out.print(a[k] + " ");
              System.out.println("hashCode = " + Arrays.hashCode(a));
              a[0] = -1;
              for (int k = 0; k < a.length; k++)
                  System.out.print(a[k] + " ");
              System.out.println("hashCode = " + Arrays.hashCode(a));
              a[5] = -1;
              for (int k = 0; k < a.length; k++)
                  System.out.print(a[k] + " ");
              System.out.println("hashCode = " + Arrays.hashCode(a));
          }
      }
      ---------- END SOURCE ----------

      CUSTOMER SUBMITTED WORKAROUND :
      You could use crc32 native algorithm for hash code calculation for any arrays. Measuring shows that your java.util.zip.CRC32.update(byte[] b) method has almost the same performance as Arrays.hashCode. Now I've created own methods for calculation array hash codes, that convert all non-byte primitive types to byte arrays and then call CRC32.update. I think you could implement more efficient solution if you offer native methods for arrays of all primitive types.

            martin Martin Buchholz
            ryeung Roger Yeung (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: