Name: mf23781 Date: 08/16/99
(3) Test Case and Failure Data:
Description of Problem:
This relates to Sun Bug - 4243978. This relates to a potential race condition
when using Weak References and the publicly accusable enqueue method.
There were a number of suggested fixes provided in the original bug report.
Code was provided for one of the sample solutions. Further analysis of the problem
by teams here in IBM Hurlsey and IBM Haifa, have raised concerns of the suitability of the
coding of the suggested fix.
It is believed that additional race conditions could be exposed relating to the thread
safety. The most obvious solution to the problem unfortunately also is risky as there
may be cases were deadlock could occur.
Problem Analysis:
The problem arose when the (public) enqueue method is called, the item could be on the
pending list. This would result in the lists being destroyed. (A test case is available
for the previous SubBug which demonstratest this well).
The suggested fix works by modifying the enqeue method to ensure that the pointers
making up the release maintain their integrity.
Concern has arisen that while the pointers are being checked changes could be made to the
lists by the Garbage Collection and Reference Handler threads. A solution to this is to
obtain the reference lock while these operations occur. However this could result in a
deadlock situation.
Full details are below - extracted from inter-IBM communications:-
1) The problem:
The idea of removing an object from the pending list before enqueuing the object is fine.
However, note that in parallel to this activity, the garbage collector and the reference handler
may also work. Therefore, various race conditions may occur in the current solution. For example,
take the first statement. It checks if "this.next" is null. The answer may be true, but then the
thread is stopped and the garbage collector may run a collection putting this reference object
in the pending list. After that, the program thread resumes and executes the actual enqueuing.
The problem is that the if-then operation is not atomic. The same problem occurs 4 times with each
of the "else" cases of the solution.
2) A possible solution:
A possible solution is to obtain the Reference.Lock all through the Reference.enqueue method. This
would solve all race problems but will cause some undesired effects. Holding the lock for a long time
hinders the collector from working as well as other parallel program threads. This poses a scalability
problem. In addition, we might expose the JVM to deadlocks. Note that the ReferenceQueue.enqueue
method synchronizes on the reference object itself. The way the proposed solution works is that enqueuing
an object causes obtaining the Reference.lock, then synchronzing on the reference object itself, and last
obtaining the referenceQueue.Lock. All through this time, a garbage collection cannot start since the
Reference.enqueue method holds the Reference.Lock that the collector should acquire. Now, a second program
thread may synchronize on the reference object and then allocate an object causing a need of garbage collection.
At the same time, the first program thread that performs the Reference.enqueue method has already acquired the
Reference.lock and is waiting to be synchronized on the reference object. In this case, the system gets into a
deadlock: The thread that has the lock on the actual reference object is waiting for collection, and the thread
that enqueues is waiting for that thread to release the lock on the reference objects, but the collection cannot
start since the collector waits for the enqueuing thread to release the Reference.lock.
(4) Targeted FCS Release:
Originally reported against 1.2.2. Aim is to achieve an agreement of what form the fix is likely
to take.
(5) Operational/Business Justification:
Impact of bug on affected product:
The original situation was raised as part of a porting exercise. This aim behind the
first report was to obtain a potential direction that a fix would take. The porting team
are concerned that the fix has potential problems.
A revised fix however involves more extensive changes to the JVM which require analysis.
Timefactors and deadlines involved:
This problem has not yet been reported by a customer - it was found by code examination.
(6) Suggested Fix:
Suggested Fix:
We believe that the best way around the original bug is to add a private field to reference objects
called pending_next, which should be used to link the pending list without any interfering with the
queue. The disadvantage of this solution is that it takes 4-8 more bytes of each reference object
(depending on the length of a pointer), but it seems better than foiling scalability (with long locked
paths), and exposing the system to deadlocks.
Documentation of how root cause was found:
The original problem was found as a result of code examiniation and this potential problem was found
in a similar manner.
Alternative Fixes (advantages/disadvntages):
Results of IBM Testing in application/customer environment:
This fix has not been coded so no results are available. Changes would have to be made as the concept
of "status" is being modified.
Regression Test Run Status/Results:
n/a
JCK Test run status:
n/a
(Review ID: 93963)
======================================================================
- duplicates
-
JDK-4243978 (ref) Race condition in Reference.enqueue()
-
- Closed
-