Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8028736

Crash in G1 after making a method with an embedded oop at the start of the code not-entrant

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Fix
    • Icon: P2 P2
    • hs25
    • hs25
    • hotspot
    • gc

      When making an nmethod non-entrant, the first few bytes of the code (e.g. 5 bytes on x86) are patched with a jump.

      If this area contains an oop, or an oop overlaps into it, G1 may crash later because it remembers that nmethod in its code roots remembered set indefinitely.

      Particularly, when this nmethod will be zombified and removed, this nmethod will not be removed from the region's code root set it has originally been registered to, because oop iteration will skip that area (see nmethod.cpp line 1851ff). So the code root set contains a stale nmethod reference that will never be cleared. So when later, after the nmethod has been removed, and the GC accesses it, the VM most likely crashes.

      An example for such a method that can cause that is:

      public class X {
        static final Object someobject;

        // assign some value to someobject somewhere
        
        // the following method is compiled
        static Object getSomeObject() {
          return someobject;
        }

        // something else
      }

      A simple fix is to, before patching the nmethod, i.e. making it not-entrant, unregister that nmethod and after that re-register for this kind of methods.

      I.e. in pseudo-code, in nmethod::make_not_entrant_or_zombie, line 1338ff:

      if (nmethod contains an oop in the patched area) {
        unregister nmethod with heap
        needs_reregister = true
      }
            NativeJump::patch_verified_entry(entry_point(), verified_entry_point(),
                        SharedRuntime::get_handle_wrong_method_stub());
      if (needs_reregister) {
        register nmethod with heap
      }

      Problem is that this register/unregister needs to be done with the codecache_lock held, but this is not the case, and further we are currently holding the patching_lock. Which means we cannot just take the codecache_lock as these two mutexes have the same rank.

      Optimization of this would check first if an oop is overwritten, and second if oop is the last reference to that region(!) within that nmethod, and only then unregister.

      Originally reported by C. Kotselidis.

      ILW: Impact: high (crash), Likelyhood: L (only ever reproduced on reporters setup), Workaround: H (none) -> P2

            Unassigned Unassigned
            tschatzl Thomas Schatzl
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: