-
Bug
-
Resolution: Unresolved
-
P4
-
11
-
generic
-
generic
ADDITIONAL SYSTEM INFORMATION :
Description: Ubuntu 23.10
Release: 23.10
Linux xxxxx 6.5.0-27-generic #28-Ubuntu SMP PREEMPT_DYNAMIC Thu Mar 7 18:21:00 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
java version "22.0.1" 2024-04-16
Java(TM) SE Runtime Environment (build 22.0.1+8-16)
Java HotSpot(TM) 64-Bit Server VM (build 22.0.1+8-16, mixed mode, sharing)
openjdk version "1.8.0_412"
OpenJDK Runtime Environment (Temurin)(build 1.8.0_412-b08)
OpenJDK 64-Bit Server VM (Temurin)(build 25.412-b08, mixed mode)
A DESCRIPTION OF THE PROBLEM :
Description
I discovered that in Java 22 (and at least Java 21) the creation of factories, specifically via
DocumentBuilderFactory.newInstance ()
SAXParserFactory.newInstance ()
TransformerFactory.newInstance ()
causes third-party parsers and transformers, located in the classpath, to be reloaded over and over again. Other factories may or may not be affected (e.g. XMLInputFactory seems not to be affected). This behaviour can not be reproduced in Java 8.
The effect is especially dramatic if the third party library is digitally signed (like the used SAXON XSL/T processor), as can be seen in the numbers below. In this case, each loading of the library causes the verification of the signatures, i.e. the calculation of the hashes of the contained class files and calculation of the signature, which consumes a lot of time. I don't know whether the reloading also has other side effects. It could be that due to the frequent reloading, the code does not get JIT compiled and will always get interpreted, causing extra runtime penalties.
I have verified the behaviour by running the "strace" tool with the test application. The number of file read operations on the JAR files is linear to the number of iterations requested (plus a little offset). I have also performed a profiling run with the help of the gprofng tool, which showed that most of the time is spent in the class loading code and in the cryptographic code.
The frequent use of the factory creation is not part of our own code. I could track down the use to the PostgreSQL JDBC driver. I guess the current usage pattern goes back to the time where the factories were marked in the Javadocs as not being thread-safe. The comments in the Javadoc have disappeared not later than with Java 8, but obviously the code remained unchanged. I have opened an issue for the developers of the JDBC driver (https://github.com/pgjdbc/pgjdbc/issues/3221).
I don't know whether this is simply a bug or whether there is a reasoning behind the reloading of the JARs. As the classpath typically does not change, there is at least no obvious reason. It would be nice if you could investigate the issue.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Steps to reproduce:
- create a Gradle project, using the given Gradle and source files
- run the program with the test type (dom|parser|transformer|input)
and the number of iterations as command line parameters
- use tools like strace, gperfng or debugger to further investigate
the behavour
measured times on my system:
jdk8u412-b08:
number of iterations: 5000
test "dom": 219ms
number of iterations: 5000
test "parser": 243ms
number of iterations: 5000
test "transformer": 598ms
number of iterations: 5000
test "input": 92ms
jdk-22.0.1:
number of iterations: 5000
test "dom": 368ms
number of iterations: 10000
test "dom": 611ms
number of iterations: 5000
test "parser": 352ms
number of iterations: 10000
test "parser": 576ms
number of iterations: 5000
test "transformer": 19995ms
number of iterations: 10000
test "transformer": 37976ms
number of iterations: 5000
test "input": 142ms
number of iterations: 10000
test "input": 172ms
---------- BEGIN SOURCE ----------
- - - Main.java - - -
import javax.xml.parsers.SAXParserFactory;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.stream.XMLInputFactory;
import javax.xml.transform.TransformerFactory;
public class Main
{
/**
* main
*
* @param args the command line arguments
* @throws Exception on any errors
*/
public static void main (String[] args)
throws Exception
{
if (args.length != 2)
{
System.err.println ("<test> <iterations>");
System.err.println ("test: dom | parser | transformer | input");
return;
}
Runnable r;
String test = args[0];
switch (test)
{
case "dom":
r = DocumentBuilderFactory::newInstance;
break;
case "parser":
r = SAXParserFactory::newInstance;
break;
case "transformer":
r = TransformerFactory::newInstance;
break;
case "input":
r = XMLInputFactory::newInstance;
break;
default:
System.err.println ("unknown test \"" + test + "\"");
return;
}
int iterations;
try
{
iterations = Integer.parseInt (args[1]);
} catch (NumberFormatException e)
{
System.err.println ("invalid number of iterations");
return;
}
System.out.println ("number of iterations: " + iterations);
long start = System.currentTimeMillis ();
for (int i = 0; i < iterations; i++)
r.run ();
long end = System.currentTimeMillis ();
System.out.println ("test \"" + test + "\": " + (end - start) + "ms");
}
}
- - - build.gradle - - -
plugins
{
id 'application'
}
sourceCompatibility = '1.8'
targetCompatibility = '1.8'
application
{
mainClass = "Main"
}
repositories
{
maven
{
url 'https://nexus.devel.dtm.knipp.de/repository/maven-public'
}
}
dependencies
{
implementation 'xerces:xercesImpl:[2,3)'
implementation 'net.sf.saxon:Saxon-HE:[9.6,9.7)'
}
- - - end - - -
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
Avoid recreation of the factories over and over again. Since the factories seem to be threadsafe in the meantime, there is no longer a need for it. For the PostgreSQL JDBC driver, there seems to be a workaround by using a custom factory-factory implementation.
FREQUENCY : always
Description: Ubuntu 23.10
Release: 23.10
Linux xxxxx 6.5.0-27-generic #28-Ubuntu SMP PREEMPT_DYNAMIC Thu Mar 7 18:21:00 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
java version "22.0.1" 2024-04-16
Java(TM) SE Runtime Environment (build 22.0.1+8-16)
Java HotSpot(TM) 64-Bit Server VM (build 22.0.1+8-16, mixed mode, sharing)
openjdk version "1.8.0_412"
OpenJDK Runtime Environment (Temurin)(build 1.8.0_412-b08)
OpenJDK 64-Bit Server VM (Temurin)(build 25.412-b08, mixed mode)
A DESCRIPTION OF THE PROBLEM :
Description
I discovered that in Java 22 (and at least Java 21) the creation of factories, specifically via
DocumentBuilderFactory.newInstance ()
SAXParserFactory.newInstance ()
TransformerFactory.newInstance ()
causes third-party parsers and transformers, located in the classpath, to be reloaded over and over again. Other factories may or may not be affected (e.g. XMLInputFactory seems not to be affected). This behaviour can not be reproduced in Java 8.
The effect is especially dramatic if the third party library is digitally signed (like the used SAXON XSL/T processor), as can be seen in the numbers below. In this case, each loading of the library causes the verification of the signatures, i.e. the calculation of the hashes of the contained class files and calculation of the signature, which consumes a lot of time. I don't know whether the reloading also has other side effects. It could be that due to the frequent reloading, the code does not get JIT compiled and will always get interpreted, causing extra runtime penalties.
I have verified the behaviour by running the "strace" tool with the test application. The number of file read operations on the JAR files is linear to the number of iterations requested (plus a little offset). I have also performed a profiling run with the help of the gprofng tool, which showed that most of the time is spent in the class loading code and in the cryptographic code.
The frequent use of the factory creation is not part of our own code. I could track down the use to the PostgreSQL JDBC driver. I guess the current usage pattern goes back to the time where the factories were marked in the Javadocs as not being thread-safe. The comments in the Javadoc have disappeared not later than with Java 8, but obviously the code remained unchanged. I have opened an issue for the developers of the JDBC driver (https://github.com/pgjdbc/pgjdbc/issues/3221).
I don't know whether this is simply a bug or whether there is a reasoning behind the reloading of the JARs. As the classpath typically does not change, there is at least no obvious reason. It would be nice if you could investigate the issue.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Steps to reproduce:
- create a Gradle project, using the given Gradle and source files
- run the program with the test type (dom|parser|transformer|input)
and the number of iterations as command line parameters
- use tools like strace, gperfng or debugger to further investigate
the behavour
measured times on my system:
jdk8u412-b08:
number of iterations: 5000
test "dom": 219ms
number of iterations: 5000
test "parser": 243ms
number of iterations: 5000
test "transformer": 598ms
number of iterations: 5000
test "input": 92ms
jdk-22.0.1:
number of iterations: 5000
test "dom": 368ms
number of iterations: 10000
test "dom": 611ms
number of iterations: 5000
test "parser": 352ms
number of iterations: 10000
test "parser": 576ms
number of iterations: 5000
test "transformer": 19995ms
number of iterations: 10000
test "transformer": 37976ms
number of iterations: 5000
test "input": 142ms
number of iterations: 10000
test "input": 172ms
---------- BEGIN SOURCE ----------
- - - Main.java - - -
import javax.xml.parsers.SAXParserFactory;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.stream.XMLInputFactory;
import javax.xml.transform.TransformerFactory;
public class Main
{
/**
* main
*
* @param args the command line arguments
* @throws Exception on any errors
*/
public static void main (String[] args)
throws Exception
{
if (args.length != 2)
{
System.err.println ("<test> <iterations>");
System.err.println ("test: dom | parser | transformer | input");
return;
}
Runnable r;
String test = args[0];
switch (test)
{
case "dom":
r = DocumentBuilderFactory::newInstance;
break;
case "parser":
r = SAXParserFactory::newInstance;
break;
case "transformer":
r = TransformerFactory::newInstance;
break;
case "input":
r = XMLInputFactory::newInstance;
break;
default:
System.err.println ("unknown test \"" + test + "\"");
return;
}
int iterations;
try
{
iterations = Integer.parseInt (args[1]);
} catch (NumberFormatException e)
{
System.err.println ("invalid number of iterations");
return;
}
System.out.println ("number of iterations: " + iterations);
long start = System.currentTimeMillis ();
for (int i = 0; i < iterations; i++)
r.run ();
long end = System.currentTimeMillis ();
System.out.println ("test \"" + test + "\": " + (end - start) + "ms");
}
}
- - - build.gradle - - -
plugins
{
id 'application'
}
sourceCompatibility = '1.8'
targetCompatibility = '1.8'
application
{
mainClass = "Main"
}
repositories
{
maven
{
url 'https://nexus.devel.dtm.knipp.de/repository/maven-public'
}
}
dependencies
{
implementation 'xerces:xercesImpl:[2,3)'
implementation 'net.sf.saxon:Saxon-HE:[9.6,9.7)'
}
- - - end - - -
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
Avoid recreation of the factories over and over again. Since the factories seem to be threadsafe in the meantime, there is no longer a need for it. For the PostgreSQL JDBC driver, there seems to be a workaround by using a custom factory-factory implementation.
FREQUENCY : always