  JDK
  JDK-8028280

ParkEvent leak when running modified runThese which only loads classes



        When running runTheseC (compileThese) we've run into native OOME, primarily on 32 bit windows builds running on large windows machines.

        I left runTheseC running overnight on a Solaris machine in the hope of using libumem's memory leak detection but I couldn't get any useful information from it, but we're definitely leaking something:
        $ pmap 5287 |grep heap
        0000000000411000 4193312K rw--- [ heap ]
        0000000100319000 2350824K rw--- [ heap ]
        Metaspace usage is around 11MB with 40MB committed so we don't have a lot of live classes it seems.

        Using libumem to gather some snapshots of all malloc() calls in a run. One thing that shows up is allocation of ParkEvents which are leaked (intentionally, it appears).
        runThese aggresively spawns threads which open JAR files, which seem to end up in JVM_RawMonitorEnter:

        ParkEvents on Solaris are 440 bytes each, and there are >10000 of them on the ParkEvent::FreeList after an hour of running the compileThese version of runThese.

        I also tried an instrumented build on Windows, where I use HeapCreate to create a separate memory heap for allocating ParkEvents to be able to track them externally to the process. After running runTheseC for around 30 minutes that heap has grown to 256MB.

        A theory for the root cause of this is that ParkEvent::Allocate is not designed to handle the load of 15-16 threads contending on a Monitor* through the JVM_RawMonitor* API.
        Using the RawMonitor functions disallows the VM from using the JavaThread's ParkEvent and forces all those contending threads to hit ParkEvent::Allocate.

        ParkEvents are maintained on a lock-free free list which is designed to avoid ABA problems by doing push-one pop-all, so there is a potential for allocation spikes while one thread is CAS:ing on the FreeList.

        I=H (aggressive memory leak if this problem occurs, can easily lead to crash due to OOME)
        L=L (very unlikely situation)
        W=H (no known work-around if this situation arises)


