Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8029302

Performance regression in Math.pow intrinsic

XMLWordPrintable

    • b15
    • linux

        FULL PRODUCT VERSION :
        java version "1.7.0_40"
        Java(TM) SE Runtime Environment (build 1.7.0_40-b43)
        Java HotSpot(TM) 64-Bit Server VM (build 24.0-b56, mixed mode)


        FULL OS VERSION :
        Linux spica 2.6.32-220.el6.x86_64 #1 SMP Tue Dec 6 19:48:22 GMT 2011 x86_64 x86_64 x86_64 GNU/Linux
        (CentOS 6)

        EXTRA RELEVANT SYSTEM CONFIGURATION :
        /proc/cpuinfo:
        processor: 0
        vendor_id: GenuineIntel
        cpu family: 6
        model: 45
        model name: Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
        stepping: 7
        cpu MHz: 2501.000
        cache size: 15360 KB
        physical id: 0
        siblings: 6
        core id: 0
        cpu cores: 6
        apicid: 0
        initial apicid: 0
        fpu: yes
        fpu_exception: yes
        cpuid level: 13
        wp: yes
        flags: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 x2apic popcnt aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid
        bogomips: 4999.75
        clflush size: 64
        cache_alignment: 64
        address sizes: 46 bits physical, 48 bits virtual
        power management:

        processor: 1
        vendor_id: GenuineIntel
        cpu family: 6
        model: 45
        model name: Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
        stepping: 7
        cpu MHz: 2501.000
        cache size: 15360 KB
        physical id: 0
        siblings: 6
        core id: 1
        cpu cores: 6
        apicid: 2
        initial apicid: 2
        fpu: yes
        fpu_exception: yes
        cpuid level: 13
        wp: yes
        flags: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 x2apic popcnt aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid
        bogomips: 4999.24
        clflush size: 64
        cache_alignment: 64
        address sizes: 46 bits physical, 48 bits virtual
        power management:

        processor: 2
        vendor_id: GenuineIntel
        cpu family: 6
        model: 45
        model name: Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
        stepping: 7
        cpu MHz: 2501.000
        cache size: 15360 KB
        physical id: 0
        siblings: 6
        core id: 2
        cpu cores: 6
        apicid: 4
        initial apicid: 4
        fpu: yes
        fpu_exception: yes
        cpuid level: 13
        wp: yes
        flags: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 x2apic popcnt aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid
        bogomips: 4999.23
        clflush size: 64
        cache_alignment: 64
        address sizes: 46 bits physical, 48 bits virtual
        power management:

        processor: 3
        vendor_id: GenuineIntel
        cpu family: 6
        model: 45
        model name: Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
        stepping: 7
        cpu MHz: 2501.000
        cache size: 15360 KB
        physical id: 0
        siblings: 6
        core id: 3
        cpu cores: 6
        apicid: 6
        initial apicid: 6
        fpu: yes
        fpu_exception: yes
        cpuid level: 13
        wp: yes
        flags: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 x2apic popcnt aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid
        bogomips: 4999.24
        clflush size: 64
        cache_alignment: 64
        address sizes: 46 bits physical, 48 bits virtual
        power management:

        processor: 4
        vendor_id: GenuineIntel
        cpu family: 6
        model: 45
        model name: Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
        stepping: 7
        cpu MHz: 2501.000
        cache size: 15360 KB
        physical id: 0
        siblings: 6
        core id: 4
        cpu cores: 6
        apicid: 8
        initial apicid: 8
        fpu: yes
        fpu_exception: yes
        cpuid level: 13
        wp: yes
        flags: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 x2apic popcnt aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid
        bogomips: 4999.24
        clflush size: 64
        cache_alignment: 64
        address sizes: 46 bits physical, 48 bits virtual
        power management:

        processor: 5
        vendor_id: GenuineIntel
        cpu family: 6
        model: 45
        model name: Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
        stepping: 7
        cpu MHz: 2501.000
        cache size: 15360 KB
        physical id: 0
        siblings: 6
        core id: 5
        cpu cores: 6
        apicid: 10
        initial apicid: 10
        fpu: yes
        fpu_exception: yes
        cpuid level: 13
        wp: yes
        flags: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 x2apic popcnt aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid
        bogomips: 4999.24
        clflush size: 64
        cache_alignment: 64
        address sizes: 46 bits physical, 48 bits virtual
        power management:

        processor: 6
        vendor_id: GenuineIntel
        cpu family: 6
        model: 45
        model name: Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
        stepping: 7
        cpu MHz: 2501.000
        cache size: 15360 KB
        physical id: 1
        siblings: 6
        core id: 0
        cpu cores: 6
        apicid: 32
        initial apicid: 32
        fpu: yes
        fpu_exception: yes
        cpuid level: 13
        wp: yes
        flags: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 x2apic popcnt aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid
        bogomips: 4999.28
        clflush size: 64
        cache_alignment: 64
        address sizes: 46 bits physical, 48 bits virtual
        power management:

        processor: 7
        vendor_id: GenuineIntel
        cpu family: 6
        model: 45
        model name: Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
        stepping: 7
        cpu MHz: 2501.000
        cache size: 15360 KB
        physical id: 1
        siblings: 6
        core id: 1
        cpu cores: 6
        apicid: 34
        initial apicid: 34
        fpu: yes
        fpu_exception: yes
        cpuid level: 13
        wp: yes
        flags: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 x2apic popcnt aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid
        bogomips: 4999.30
        clflush size: 64
        cache_alignment: 64
        address sizes: 46 bits physical, 48 bits virtual
        power management:

        processor: 8
        vendor_id: GenuineIntel
        cpu family: 6
        model: 45
        model name: Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
        stepping: 7
        cpu MHz: 2501.000
        cache size: 15360 KB
        physical id: 1
        siblings: 6
        core id: 2
        cpu cores: 6
        apicid: 36
        initial apicid: 36
        fpu: yes
        fpu_exception: yes
        cpuid level: 13
        wp: yes
        flags: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 x2apic popcnt aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid
        bogomips: 4999.29
        clflush size: 64
        cache_alignment: 64
        address sizes: 46 bits physical, 48 bits virtual
        power management:

        processor: 9
        vendor_id: GenuineIntel
        cpu family: 6
        model: 45
        model name: Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
        stepping: 7
        cpu MHz: 2501.000
        cache size: 15360 KB
        physical id: 1
        siblings: 6
        core id: 3
        cpu cores: 6
        apicid: 38
        initial apicid: 38
        fpu: yes
        fpu_exception: yes
        cpuid level: 13
        wp: yes
        flags: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 x2apic popcnt aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid
        bogomips: 4999.29
        clflush size: 64
        cache_alignment: 64
        address sizes: 46 bits physical, 48 bits virtual
        power management:

        processor: 10
        vendor_id: GenuineIntel
        cpu family: 6
        model: 45
        model name: Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
        stepping: 7
        cpu MHz: 2501.000
        cache size: 15360 KB
        physical id: 1
        siblings: 6
        core id: 4
        cpu cores: 6
        apicid: 40
        initial apicid: 40
        fpu: yes
        fpu_exception: yes
        cpuid level: 13
        wp: yes
        flags: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 x2apic popcnt aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid
        bogomips: 4999.27
        clflush size: 64
        cache_alignment: 64
        address sizes: 46 bits physical, 48 bits virtual
        power management:

        processor: 11
        vendor_id: GenuineIntel
        cpu family: 6
        model: 45
        model name: Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
        stepping: 7
        cpu MHz: 2501.000
        cache size: 15360 KB
        physical id: 1
        siblings: 6
        core id: 5
        cpu cores: 6
        apicid: 42
        initial apicid: 42
        fpu: yes
        fpu_exception: yes
        cpuid level: 13
        wp: yes
        flags: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 x2apic popcnt aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid
        bogomips: 4999.27
        clflush size: 64
        cache_alignment: 64
        address sizes: 46 bits physical, 48 bits virtual
        power management:


        A DESCRIPTION OF THE PROBLEM :
        It seems the Math.pow() implementation has changed between 7u25 and 7u40, with a strong performance regression.

        Attached test case shows on my machine:
         - 7u25: ~1700ms
         - 7u40: ~8500ms

        Using "-XX:+UnlockDiagnosticVMOptions -XX:+PrintIntrinsics" shows the intrinsic implementation is used in both cases.


        THE PROBLEM WAS REPRODUCIBLE WITH -Xint FLAG: Yes

        THE PROBLEM WAS REPRODUCIBLE WITH -server FLAG: Yes

        REGRESSION. Last worked in version 7u25

        STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
        - Compile attached code
        - Run with JDK 7u25 and 7u40

        REPRODUCIBILITY :
        This bug can be reproduced always.

        ---------- BEGIN SOURCE ----------
        import java.util.Random;

        public class Main {

            public static void main(String[] args) throws Exception {

                while (true) {

                    final Random random = new Random();
                    final double[] values = new double[100_000_000];
                    for (int i = 0; i < values.length; i++)
                        values[i] = random.nextDouble();

                    System.gc();

                    final long start = System.currentTimeMillis();

                    double blackhole = 0;
                    for (int i = 0; i < values.length; i++)
                        blackhole += Math.pow(values[i], 2);

                    final long elapsed = System.currentTimeMillis() - start;

                    System.out.println(elapsed + "ms (" + blackhole + ")");
                }
            }
        }
        ---------- END SOURCE ----------

              adlertz Niclas Adlertz (Inactive)
              webbuggrp Webbug Group
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

                Created:
                Updated:
                Resolved: