Name: mf23781 Date: 10/20/98
The following comments from Alan Webb describe the problem fixed here and in the related 112 defect 4323.
At the 112 level the leak was being caused by the fact that jni_DetachCurrentThread did not cause the
sys_thread_t (our native thread state) control block to be released. To fix this I did the following:
- sysThreadFree (in threads_md.c) was changed so that it removed the sys_thread_t from
ThreadQueue when the thread was being cleaned up if it is a JNI thread. It does *not* release the
memory because this routine is called from elesewhere and the tid is referenced after the event.
- sysThreadExit (in threads_md.c) was change to free the sys_thread_t after verifying that this was a JNI thread.
- jni_DetachCurrentThread (in jni.c) was changed to call sysFree to release the sys_thread_t before returning.
These changes collectively eliminate the leak at 112. I ran my testcase up to 10,000,000 threads with zero
leakage. In the course of debugging this leak I noticed that AttachCurrent leaked memory in a similar
fashion if the thread activation fails for any reason. I have fixed that too.
At the 114 level an additional leak was introduced by the MON_FLAT implementation. The count of
active threads is incremented in sysThreadAlloc, but is not decremented in sysThreadFree. As a result,
the table is continuously growing. I fixed this as follows:
- Decremented the active thread count for JNI threads in sysThreadFree.
- Zeroed the index table entry for the thread being deactivated. This allows sysThreadAlloc to re-use the entry.
In fairness, the original behaviour was correct given that the sys_thread_t blocks were only allocated
and never released.
When I was debugging this problem, I discovered that sysMonitorEnterQuicker was dreferencing a
null pointer. This is because the JNI code requests a (JVM style) lock, before the thread has been
introduced into the JVM. As a result sysThreadSelf() returns a null pointer (valid because the
JVM has not heard of the thread at this point). sysMonitorEnterQuicker does not chekc for null,
and as a result dereferences the null pointer. This is harmless in the non-debug case,
because on AIX, dereferencing null, returns null. It does however mean that the lock is not
really obtained, indeed it introduces a disallowed state where the monitor usage count is
greater than zero, and the owner field is null. This would make the general behaviour of the
monitor system somewhat undefined.
I haven't done anything to fix this, because it needs to be fixed by somebody who understands the
MON_FLAT implementation in general.
======================================================================