Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-6921969

optimize 64 long multiply for case with high bits zero

    XMLWordPrintable

Details

    • Enhancement
    • Status: Resolved
    • P3
    • Resolution: Fixed
    • hs17
    • hs17
    • hotspot
    • None
    • b09
    • sparc
    • solaris_9

    Backports

      Description

        Hi Tom, Christian, and others,

        Here's a patch I'd like to contribute:
        http://cr.openjdk.java.net/~rasbold/69XXXXX/webrev.00/

        With it, C2 generates shorter long multiplication sequences on x86_32
        when the high 32 bits are known to be zero.

        Particularly, this applies to the loop in BigInteger.mulAdd():

           private final static long LONG_MASK = 0xffffffffL;

           static int mulAdd(int[] out, int[] in, int offset, int len, int k) {
               long kLong = k & LONG_MASK;
               long carry = 0;

               offset = out.length-offset - 1;
               for (int j=len-1; j >= 0; j--) {
                   long product = (in[j] & LONG_MASK) * kLong +
                                  (out[offset] & LONG_MASK) + carry;
                   out[offset--] = (int)product;
                   carry = product >>> 32;
               }
               return (int)carry;
           }

        In my measurements, one of our internal microbenchmarks that uses
        BigInteger.mulAdd sped up about 12%. Also, SPECjvm2008's crypto.rsa
        and crypto.signverify improved about 7% and 2.3%, respectively.

        Attachments

          Issue Links

            Activity

              People

                never Tom Rodriguez
                never Tom Rodriguez
                Votes:
                0 Vote for this issue
                Watchers:
                0 Start watching this issue

                Dates

                  Created:
                  Updated:
                  Resolved:
                  Imported:
                  Indexed: