FULL PRODUCT VERSION :
java version "9.0.1"
Java(TM) SE Runtime Environment (build 9.0.1+11)
Java HotSpot(TM) 64-Bit Server VM (build 9.0.1+11, mixed mode)
FULL OS VERSION :
Linux pnod0336 3.10.0-514.6.1.el7.x86_64 #1 SMP Wed Jan 18 13:06:36 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
EXTRA RELEVANT SYSTEM CONFIGURATION :
Intel(R) Xeon Phi(TM) CPU 7210 @ 1.30GHz
A DESCRIPTION OF THE PROBLEM :
Results of the following simple program are different between runs on KNL with -XX:UseAVX=3 and without this flag set.
Java does some computation (probably kicking in AVX-512 vectorization) and in between it calls a native method via JNI. The native implementation is compiled for AVX-512 using Intel Compiler:
icc version 17.0.3 (gcc version 4.8.5 compatibility)
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
See the code below.
cat ./reproduce.sh:
echo "Compiling native code"
icc ./systeminfoJNI.c -I$JAVA_HOME/include -I$JAVA_HOME/include/linux -xMIC-AVX512 -fPIC -std=gnu99 -o libSystemInfo.so -shared
echo "Compiling java"
javac TestAVX.java
echo "Executing test program with -XX:UseAVX=3"
java -Djava.library.path=. -XX:UseAVX=3 TestAVX
echo "Executing test program without UseAVX"
java -Djava.library.path=. TestAVX
EXPECTED VERSUS ACTUAL BEHAVIOR :
Sample program prints 0 when UseAVX=3. It should print the same value as with no AVX-512 enabled:
1711
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
class TestAVX {
public static class SystemInfo{
static {
System.loadLibrary("SystemInfo");
}
public native int getNumberOfHardwareThreads();
}
static class TraceCacheSizeEstimator {
private static final float CORE_WEIGHT_KNL_HOST = .5f;
static final float LIVE_SAMP_FRAC = 213f;
static final float FAST_OUTSAMPS = 1.e6f;
public final long defaultTraceCacheSize;
public final long minimumTraceCacheSize;
public final long traceObjectSize;
public TraceCacheSizeEstimator(
int numberOfSamples, int numberOfHorizonPoints, int numberOfBins, int numberOfFilters,
SystemInfo systemInfo) {
float averageNumberOfOutputLocationsInAperture = 13213213.21f;
float averageNumberOfLiveSamplesInAperture = LIVE_SAMP_FRAC * numberOfSamples * averageNumberOfOutputLocationsInAperture;
averageNumberOfLiveSamplesInAperture = Math.max(averageNumberOfLiveSamplesInAperture, FAST_OUTSAMPS);
traceObjectSize = 25665;
float hostMicEquivalents = getHostMicEquivalents(systemInfo);
float allMicEquivalents = hostMicEquivalents ;
defaultTraceCacheSize = (long) (allMicEquivalents * 3423.90f);
minimumTraceCacheSize = (long) (allMicEquivalents * 3213213.f);
}
private float getHostMicEquivalents(SystemInfo systemInfo ){
return CORE_WEIGHT_KNL_HOST * Math.min(1.0f, 2.0f * 256 / systemInfo.getNumberOfHardwareThreads());
}
}
private static TraceCacheSizeEstimator create(){
int numberOfSamples = 15;
int numberOfHorizonPoints = 1;
int numberOfBins = 81;
int numberOfFilters = 4;
return new TraceCacheSizeEstimator(
numberOfSamples, numberOfHorizonPoints, numberOfBins, numberOfFilters,
new SystemInfo());
}
public static void main(String[] argv) throws Exception {
TraceCacheSizeEstimator trc = create();
System.out.println(String.format("%d", trc.defaultTraceCacheSize));
}
systeminfoJNI.c :
#define _GNU_SOURCE
#include <sched.h>
#include "jni.h"
int duthread_ncpus() {
cpu_set_t cs;
CPU_ZERO(&cs);
sched_getaffinity(0, sizeof(cs), &cs);
int count = 0;
for (int i = 0; i < CPU_SETSIZE; i++)
{
if (CPU_ISSET(i, &cs))
count++;
}
return count;
}
JNIEXPORT jint JNICALL Java_TestAVX_00024SystemInfo_getNumberOfHardwareThreads
(JNIEnv * env, jobject obj){
return duthread_ncpus();
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
Not using AVX-512.
java version "9.0.1"
Java(TM) SE Runtime Environment (build 9.0.1+11)
Java HotSpot(TM) 64-Bit Server VM (build 9.0.1+11, mixed mode)
FULL OS VERSION :
Linux pnod0336 3.10.0-514.6.1.el7.x86_64 #1 SMP Wed Jan 18 13:06:36 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
EXTRA RELEVANT SYSTEM CONFIGURATION :
Intel(R) Xeon Phi(TM) CPU 7210 @ 1.30GHz
A DESCRIPTION OF THE PROBLEM :
Results of the following simple program are different between runs on KNL with -XX:UseAVX=3 and without this flag set.
Java does some computation (probably kicking in AVX-512 vectorization) and in between it calls a native method via JNI. The native implementation is compiled for AVX-512 using Intel Compiler:
icc version 17.0.3 (gcc version 4.8.5 compatibility)
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
See the code below.
cat ./reproduce.sh:
echo "Compiling native code"
icc ./systeminfoJNI.c -I$JAVA_HOME/include -I$JAVA_HOME/include/linux -xMIC-AVX512 -fPIC -std=gnu99 -o libSystemInfo.so -shared
echo "Compiling java"
javac TestAVX.java
echo "Executing test program with -XX:UseAVX=3"
java -Djava.library.path=. -XX:UseAVX=3 TestAVX
echo "Executing test program without UseAVX"
java -Djava.library.path=. TestAVX
EXPECTED VERSUS ACTUAL BEHAVIOR :
Sample program prints 0 when UseAVX=3. It should print the same value as with no AVX-512 enabled:
1711
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
class TestAVX {
public static class SystemInfo{
static {
System.loadLibrary("SystemInfo");
}
public native int getNumberOfHardwareThreads();
}
static class TraceCacheSizeEstimator {
private static final float CORE_WEIGHT_KNL_HOST = .5f;
static final float LIVE_SAMP_FRAC = 213f;
static final float FAST_OUTSAMPS = 1.e6f;
public final long defaultTraceCacheSize;
public final long minimumTraceCacheSize;
public final long traceObjectSize;
public TraceCacheSizeEstimator(
int numberOfSamples, int numberOfHorizonPoints, int numberOfBins, int numberOfFilters,
SystemInfo systemInfo) {
float averageNumberOfOutputLocationsInAperture = 13213213.21f;
float averageNumberOfLiveSamplesInAperture = LIVE_SAMP_FRAC * numberOfSamples * averageNumberOfOutputLocationsInAperture;
averageNumberOfLiveSamplesInAperture = Math.max(averageNumberOfLiveSamplesInAperture, FAST_OUTSAMPS);
traceObjectSize = 25665;
float hostMicEquivalents = getHostMicEquivalents(systemInfo);
float allMicEquivalents = hostMicEquivalents ;
defaultTraceCacheSize = (long) (allMicEquivalents * 3423.90f);
minimumTraceCacheSize = (long) (allMicEquivalents * 3213213.f);
}
private float getHostMicEquivalents(SystemInfo systemInfo ){
return CORE_WEIGHT_KNL_HOST * Math.min(1.0f, 2.0f * 256 / systemInfo.getNumberOfHardwareThreads());
}
}
private static TraceCacheSizeEstimator create(){
int numberOfSamples = 15;
int numberOfHorizonPoints = 1;
int numberOfBins = 81;
int numberOfFilters = 4;
return new TraceCacheSizeEstimator(
numberOfSamples, numberOfHorizonPoints, numberOfBins, numberOfFilters,
new SystemInfo());
}
public static void main(String[] argv) throws Exception {
TraceCacheSizeEstimator trc = create();
System.out.println(String.format("%d", trc.defaultTraceCacheSize));
}
systeminfoJNI.c :
#define _GNU_SOURCE
#include <sched.h>
#include "jni.h"
int duthread_ncpus() {
cpu_set_t cs;
CPU_ZERO(&cs);
sched_getaffinity(0, sizeof(cs), &cs);
int count = 0;
for (int i = 0; i < CPU_SETSIZE; i++)
{
if (CPU_ISSET(i, &cs))
count++;
}
return count;
}
JNIEXPORT jint JNICALL Java_TestAVX_00024SystemInfo_getNumberOfHardwareThreads
(JNIEnv * env, jobject obj){
return duthread_ncpus();
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
Not using AVX-512.