Something is causing the monitor cache to leak monitors when threads are
explicitly killed. The original test case was when using WebRunner to go back
and forth between two pages. It would normally leak one monitor in each
direction. The monitor had been owned by a DWIChildUpdater thread which
is created and killed each direction.
The bug is caused by the use of an entry_count to control when the monitor is
to be removed from the cache. entry_count is incremented on monitorEnter() and
decremented on monitorExit() only. If on monitorExit() the decremented entry_count
is 0, the monitor is removed from the cache. The trouble is, once a thread has
entered the monitor it may end up on its condvar_waitq, so not have left the
monitor as far as entry_count is concerned. If that thread is killed, it is removed
from the condvar_waitq at a very low level that doesn't know about entry_count.
That might remove the last user of the monitor, but not cause it to be removed from
the cache. Eventually the cache fills up with leaked monitors, and the system hangs
waiting for notification of a free cache monitor.
There is a trivial fix I'm working on, but it seems to tickle other bugs. In addition,
Jon removed the killing of the DWIChildUpdater threads from the freeze system so
as not to hit the problem (delta 1.20 of DisplayItemWindow.oak), but backing out
that change should unmask the bug.
explicitly killed. The original test case was when using WebRunner to go back
and forth between two pages. It would normally leak one monitor in each
direction. The monitor had been owned by a DWIChildUpdater thread which
is created and killed each direction.
The bug is caused by the use of an entry_count to control when the monitor is
to be removed from the cache. entry_count is incremented on monitorEnter() and
decremented on monitorExit() only. If on monitorExit() the decremented entry_count
is 0, the monitor is removed from the cache. The trouble is, once a thread has
entered the monitor it may end up on its condvar_waitq, so not have left the
monitor as far as entry_count is concerned. If that thread is killed, it is removed
from the condvar_waitq at a very low level that doesn't know about entry_count.
That might remove the last user of the monitor, but not cause it to be removed from
the cache. Eventually the cache fills up with leaked monitors, and the system hangs
waiting for notification of a free cache monitor.
There is a trivial fix I'm working on, but it seems to tickle other bugs. In addition,
Jon removed the killing of the DWIChildUpdater threads from the freeze system so
as not to hit the problem (delta 1.20 of DisplayItemWindow.oak), but backing out
that change should unmask the bug.