-
Bug
-
Resolution: Fixed
-
P4
-
11, 17, 21, 23
-
b12
-
generic
-
generic
-
Verified
A DESCRIPTION OF THE PROBLEM :
The reason for the accumulation is because the "Cleanup-SunPKCS11" thread which runs the NativeResourceCleaner runnable was no longer present in the thread dumps. As a result, the cleanup of these objects was not taking place.
We monitored the system for several days after until the the issue eventually happened again, and we were able to see the unhandled exception which caused the cleanup thread to die:
2024-01-27T06:46:37.319+0100|ERROR|Cleanup-SunPKCS11:30:|419|com.polycom.proximo.core.DefaultUncaughtExceptionHandler| Uncaught Exception from thread Cleanup-SunPKCS11
java.security.ProviderException: Internal error: objects created -1
at sun.security.pkcs11.Session.removeObject(Session.java:103) ~[jdk.crypto.cryptoki:?]
at sun.security.pkcs11.SessionKeyRef.updateNativeKey(P11Key.java:1393) ~[jdk.crypto.cryptoki:?]
at sun.security.pkcs11.SessionKeyRef.removeNativeKey(P11Key.java:1374) ~[jdk.crypto.cryptoki:?]
at sun.security.pkcs11.SessionKeyRef.dispose(P11Key.java:1409) ~[jdk.crypto.cryptoki:?]
at sun.security.pkcs11.P11Key.drainRefQueue(P11Key.java:174) ~[jdk.crypto.cryptoki:?]
at sun.security.pkcs11.SunPKCS11$NativeResourceCleaner.run(SunPKCS11.java:1121) ~[jdk.crypto.cryptoki:?]
at java.lang.Thread.run(Thread.java:829) [?:?]
at jdk.internal.misc.InnocuousThread.run(InnocuousThread.java:161) ~[?:?]
SessionKeyRef.updateNativeKey() calls Session.removeObject() which can throw a ProviderException if the value from the decrementAndGet() call to the createdObjects AtomicInteger is < 0. Because updateNativeKey() doesn't catch, the exception is unhandled and the cleanup thread dies. From this point on, there is no cleanup thread to free these SessionKeyRef instances in refSet, so they leak until a crash or reboot.
Because this is happening on a customer production system, we don't know what specific traffic pattern is causing this exception, and because the issue happens at random after many days of running in a heavily loaded production environment, we can't really get that data in any realistic way. So I can't provide a clean test case. But regardless of the cause, the cleanup thread shouldn't be allowed to die as it just then causes a leak.
The simplest thing to do would probably be for updateNativeKey() to check session.hasObjects() before calling session.removeObject(), but there are plenty of other options to deal with the issue (like just catching the ProviderException in updateNativeKey()). Regardless, the current implementation is susceptible to this leak and so should be fixed.
A simple workaround is to simply use a different security provider. We were already doing this for our customers due to a bug I reported a few weeks ago (JDK-8324585) which was an unrelated native memory leak that prompted us to also switch customer systems to the Sun security provider.
CUSTOMER SUBMITTED WORKAROUND :
Use a different security provider.
FREQUENCY : rarely
The reason for the accumulation is because the "Cleanup-SunPKCS11" thread which runs the NativeResourceCleaner runnable was no longer present in the thread dumps. As a result, the cleanup of these objects was not taking place.
We monitored the system for several days after until the the issue eventually happened again, and we were able to see the unhandled exception which caused the cleanup thread to die:
2024-01-27T06:46:37.319+0100|ERROR|Cleanup-SunPKCS11:30:|419|com.polycom.proximo.core.DefaultUncaughtExceptionHandler| Uncaught Exception from thread Cleanup-SunPKCS11
java.security.ProviderException: Internal error: objects created -1
at sun.security.pkcs11.Session.removeObject(Session.java:103) ~[jdk.crypto.cryptoki:?]
at sun.security.pkcs11.SessionKeyRef.updateNativeKey(P11Key.java:1393) ~[jdk.crypto.cryptoki:?]
at sun.security.pkcs11.SessionKeyRef.removeNativeKey(P11Key.java:1374) ~[jdk.crypto.cryptoki:?]
at sun.security.pkcs11.SessionKeyRef.dispose(P11Key.java:1409) ~[jdk.crypto.cryptoki:?]
at sun.security.pkcs11.P11Key.drainRefQueue(P11Key.java:174) ~[jdk.crypto.cryptoki:?]
at sun.security.pkcs11.SunPKCS11$NativeResourceCleaner.run(SunPKCS11.java:1121) ~[jdk.crypto.cryptoki:?]
at java.lang.Thread.run(Thread.java:829) [?:?]
at jdk.internal.misc.InnocuousThread.run(InnocuousThread.java:161) ~[?:?]
SessionKeyRef.updateNativeKey() calls Session.removeObject() which can throw a ProviderException if the value from the decrementAndGet() call to the createdObjects AtomicInteger is < 0. Because updateNativeKey() doesn't catch, the exception is unhandled and the cleanup thread dies. From this point on, there is no cleanup thread to free these SessionKeyRef instances in refSet, so they leak until a crash or reboot.
Because this is happening on a customer production system, we don't know what specific traffic pattern is causing this exception, and because the issue happens at random after many days of running in a heavily loaded production environment, we can't really get that data in any realistic way. So I can't provide a clean test case. But regardless of the cause, the cleanup thread shouldn't be allowed to die as it just then causes a leak.
The simplest thing to do would probably be for updateNativeKey() to check session.hasObjects() before calling session.removeObject(), but there are plenty of other options to deal with the issue (like just catching the ProviderException in updateNativeKey()). Regardless, the current implementation is susceptible to this leak and so should be fixed.
A simple workaround is to simply use a different security provider. We were already doing this for our customers due to a bug I reported a few weeks ago (
CUSTOMER SUBMITTED WORKAROUND :
Use a different security provider.
FREQUENCY : rarely
- relates to
-
JDK-8324585 JVM native memory leak in PCKS11-NSS security provider
- Resolved