Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-4262633

java.lang.ref.Reference -Enqueue Race Condition - Fix may not be thread safe

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: P4 P4
    • None
    • 1.2.0
    • hotspot
    • x86
    • windows_nt



      Name: mf23781 Date: 08/16/99


      (3) Test Case and Failure Data:

               Description of Problem:
                  This relates to Sun Bug - 4243978. This relates to a potential race condition
                  when using Weak References and the publicly accusable enqueue method.
                  
                  There were a number of suggested fixes provided in the original bug report.
                  Code was provided for one of the sample solutions. Further analysis of the problem
                  by teams here in IBM Hurlsey and IBM Haifa, have raised concerns of the suitability of the
                  coding of the suggested fix.
                  
                  It is believed that additional race conditions could be exposed relating to the thread
                  safety. The most obvious solution to the problem unfortunately also is risky as there
                  may be cases were deadlock could occur.
            
               Problem Analysis:
                                                         
                  The problem arose when the (public) enqueue method is called, the item could be on the
                  pending list. This would result in the lists being destroyed. (A test case is available
                  for the previous SubBug which demonstratest this well).
                  
                  The suggested fix works by modifying the enqeue method to ensure that the pointers
                  making up the release maintain their integrity.
                  
                  Concern has arisen that while the pointers are being checked changes could be made to the
                  lists by the Garbage Collection and Reference Handler threads. A solution to this is to
                  obtain the reference lock while these operations occur. However this could result in a
                  deadlock situation.
                  
                  Full details are below - extracted from inter-IBM communications:-
                  
                  1) The problem:
                      The idea of removing an object from the pending list before enqueuing the object is fine.
                      However, note that in parallel to this activity, the garbage collector and the reference handler
                      may also work. Therefore, various race conditions may occur in the current solution. For example,
                      take the first statement. It checks if "this.next" is null. The answer may be true, but then the
                      thread is stopped and the garbage collector may run a collection putting this reference object
                      in the pending list. After that, the program thread resumes and executes the actual enqueuing.
                      The problem is that the if-then operation is not atomic. The same problem occurs 4 times with each
                      of the "else" cases of the solution.

                  2) A possible solution:
                      A possible solution is to obtain the Reference.Lock all through the Reference.enqueue method. This
                      would solve all race problems but will cause some undesired effects. Holding the lock for a long time
                      hinders the collector from working as well as other parallel program threads. This poses a scalability
                      problem. In addition, we might expose the JVM to deadlocks. Note that the ReferenceQueue.enqueue
                      method synchronizes on the reference object itself. The way the proposed solution works is that enqueuing
                      an object causes obtaining the Reference.lock, then synchronzing on the reference object itself, and last
                      obtaining the referenceQueue.Lock. All through this time, a garbage collection cannot start since the
                      Reference.enqueue method holds the Reference.Lock that the collector should acquire. Now, a second program
                      thread may synchronize on the reference object and then allocate an object causing a need of garbage collection.
                      At the same time, the first program thread that performs the Reference.enqueue method has already acquired the
                      Reference.lock and is waiting to be synchronized on the reference object. In this case, the system gets into a
                      deadlock: The thread that has the lock on the actual reference object is waiting for collection, and the thread
                      that enqueues is waiting for that thread to release the lock on the reference objects, but the collection cannot
                      start since the collector waits for the enqueuing thread to release the Reference.lock.




          
                  
      (4) Targeted FCS Release:
                  
                  Originally reported against 1.2.2. Aim is to achieve an agreement of what form the fix is likely
                  to take.


      (5) Operational/Business Justification:

             Impact of bug on affected product:
                   The original situation was raised as part of a porting exercise. This aim behind the
                   first report was to obtain a potential direction that a fix would take. The porting team
                   are concerned that the fix has potential problems.
                   
                   A revised fix however involves more extensive changes to the JVM which require analysis.

             Timefactors and deadlines involved:
                  This problem has not yet been reported by a customer - it was found by code examination.
             
      (6) Suggested Fix:

              Suggested Fix:
              
                  We believe that the best way around the original bug is to add a private field to reference objects
                  called pending_next, which should be used to link the pending list without any interfering with the
                  queue. The disadvantage of this solution is that it takes 4-8 more bytes of each reference object
                  (depending on the length of a pointer), but it seems better than foiling scalability (with long locked
                  paths), and exposing the system to deadlocks.
          
                                      
              Documentation of how root cause was found:
                  The original problem was found as a result of code examiniation and this potential problem was found
                  in a similar manner.
              
              Alternative Fixes (advantages/disadvntages):
              
              Results of IBM Testing in application/customer environment:
                  This fix has not been coded so no results are available. Changes would have to be made as the concept
                  of "status" is being modified.
              
              Regression Test Run Status/Results:
                      n/a

              JCK Test run status:
                      n/a
              


      (Review ID: 93963)

      ======================================================================

            Unassigned Unassigned
            miflemi Mick Fleming
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: