-
Enhancement
-
Resolution: Unresolved
-
P4
-
8, 15, 16
-
x86_64
-
generic
ADDITIONAL SYSTEM INFORMATION :
Ubuntu 14.04 & Oracle JDK 8
Ubuntu 18.04/20.04 & OpenJDK 11
CPUs tested: Ryzen 3970X, Xeon E5 v3, Xeon E5 v4 and Xeon scalable
A DESCRIPTION OF THE PROBLEM :
I have constructed a simple compute intensive test, initially designed to observe the speed difference between different CPU architectures. The test code iterates in a loop with 1 billion iterations, initializes two variables that are then swapped by the means of a temporary variable. For the purpose to avoiding aggressive optimizations, the results are summed and then printed at the end of the loop. When running the test without any JVM parameters, I have found out that first two loops are actually 50% faster and this speed is no longer achieved even if run for hours continuously. This is contrary to all expectations where first iterations are assumed to be slower than the next ones. Given that printed results are the same and that the 50% speedup of first iterations is no longer regained, it suggest that there is a performance regression in the JVM or a huge untapped potential for optimizations when dealing with C like compute intensive code, characterized by low to non existent heap memory allocation and heavy usage of primitives.
Changing tiered compilation JVM options yields no benefit or makes the execution worse.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Compile and run the code attached in the code section.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Execution time stable around a minimum value.
ACTUAL -
Execution time stable but around 50% slower than the first two iterations.
---------- BEGIN SOURCE ----------
public class VariableExchangeTest {
public static void main(String[] args) {
VariableExchangeTest test = new VariableExchangeTest();
for (int i = 0; i < 10000; i++) {
test.testVariableExchange();
}
}
public void testVariableExchange() {
long begin = System.currentTimeMillis();
long sumA = 0;
long sumB = 0;
for (long index = 0; index < 1000000000; index++) {
long a = index;
long b = index << 2;
long temp = a;
a = b;
b = temp;
sumA += a;
sumB += b;
}
long end = System.currentTimeMillis();
System.out.println("XCG Sum a: " + sumA + ", b: " + sumB + ", time: " + (end - begin));
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
None
FREQUENCY : always
Ubuntu 14.04 & Oracle JDK 8
Ubuntu 18.04/20.04 & OpenJDK 11
CPUs tested: Ryzen 3970X, Xeon E5 v3, Xeon E5 v4 and Xeon scalable
A DESCRIPTION OF THE PROBLEM :
I have constructed a simple compute intensive test, initially designed to observe the speed difference between different CPU architectures. The test code iterates in a loop with 1 billion iterations, initializes two variables that are then swapped by the means of a temporary variable. For the purpose to avoiding aggressive optimizations, the results are summed and then printed at the end of the loop. When running the test without any JVM parameters, I have found out that first two loops are actually 50% faster and this speed is no longer achieved even if run for hours continuously. This is contrary to all expectations where first iterations are assumed to be slower than the next ones. Given that printed results are the same and that the 50% speedup of first iterations is no longer regained, it suggest that there is a performance regression in the JVM or a huge untapped potential for optimizations when dealing with C like compute intensive code, characterized by low to non existent heap memory allocation and heavy usage of primitives.
Changing tiered compilation JVM options yields no benefit or makes the execution worse.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Compile and run the code attached in the code section.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Execution time stable around a minimum value.
ACTUAL -
Execution time stable but around 50% slower than the first two iterations.
---------- BEGIN SOURCE ----------
public class VariableExchangeTest {
public static void main(String[] args) {
VariableExchangeTest test = new VariableExchangeTest();
for (int i = 0; i < 10000; i++) {
test.testVariableExchange();
}
}
public void testVariableExchange() {
long begin = System.currentTimeMillis();
long sumA = 0;
long sumB = 0;
for (long index = 0; index < 1000000000; index++) {
long a = index;
long b = index << 2;
long temp = a;
a = b;
b = temp;
sumA += a;
sumB += b;
}
long end = System.currentTimeMillis();
System.out.println("XCG Sum a: " + sumA + ", b: " + sumB + ", time: " + (end - begin));
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
None
FREQUENCY : always
- relates to
-
JDK-8149745 C2 should optimize long accumulations in a counted loop
- Closed