-
Enhancement
-
Resolution: Unresolved
-
P4
-
20
-
generic
-
generic
A DESCRIPTION OF THE PROBLEM :
For C++, on x86-64 machines, compilers typically generate a two-instruction min function, where when the code looks like:
float min1(float x, float y) {
if(x < y) {
return x;
}
return y;
}
with the assembly looking something like this:
min1(float, float): # @min1(float, float)
minss %xmm1, %xmm0
retq
(results generated from clang, similar results are yielded by gcc).
For the same code, java generates the following (with obvious code around it for handling the stack):
min1(float, float):
vucomiss %xmm0,%xmm1 # Compare the two numbers
ja .L1 # If x < y, jump to return
vmovaps %xmm1,%xmm0 # Else, move y (xmm1) into x (xmm0)
retq # Return, with the result stored in x (xmm0)
.L1:
ret # Return, with the result stored in x (xmm0)
Notably, there is no minss instruction being used, while it is used in the min intrinsic.
float min2(float x, float y) {
return Math.min(x, y);
}
gives
min2(float, float):
vblendvps %xmm0,%xmm1,%xmm0,%xmm5
vblendvps %xmm0,%xmm0,%xmm1,%xmm3
vminss %xmm3,%xmm5,%xmm2
vcmpunordps %xmm5,%xmm5,%xmm3
vblendvps %xmm3,%xmm5,%xmm2,%xmm0
While it makes sense that the intrinsic wouldn't do the same operations, as it is supposed to still yield the same result as the Java code (which it does), it could also use the vminss instruction for the if statement, and still have the same effect. This could also be used for max, as well as clamp implementations.
For C++, on x86-64 machines, compilers typically generate a two-instruction min function, where when the code looks like:
float min1(float x, float y) {
if(x < y) {
return x;
}
return y;
}
with the assembly looking something like this:
min1(float, float): # @min1(float, float)
minss %xmm1, %xmm0
retq
(results generated from clang, similar results are yielded by gcc).
For the same code, java generates the following (with obvious code around it for handling the stack):
min1(float, float):
vucomiss %xmm0,%xmm1 # Compare the two numbers
ja .L1 # If x < y, jump to return
vmovaps %xmm1,%xmm0 # Else, move y (xmm1) into x (xmm0)
retq # Return, with the result stored in x (xmm0)
.L1:
ret # Return, with the result stored in x (xmm0)
Notably, there is no minss instruction being used, while it is used in the min intrinsic.
float min2(float x, float y) {
return Math.min(x, y);
}
gives
min2(float, float):
vblendvps %xmm0,%xmm1,%xmm0,%xmm5
vblendvps %xmm0,%xmm0,%xmm1,%xmm3
vminss %xmm3,%xmm5,%xmm2
vcmpunordps %xmm5,%xmm5,%xmm3
vblendvps %xmm3,%xmm5,%xmm2,%xmm0
While it makes sense that the intrinsic wouldn't do the same operations, as it is supposed to still yield the same result as the Java code (which it does), it could also use the vminss instruction for the if statement, and still have the same effect. This could also be used for max, as well as clamp implementations.