-
Bug
-
Resolution: Fixed
-
P3
-
hs20
-
b02
-
generic
-
generic
-
Not verified
Issue | Fix Version | Assignee | Priority | Status | Resolution | Resolved In Build |
---|---|---|---|---|---|---|
JDK-2202394 | 7 | Tony Printezis | P3 | Resolved | Fixed | b118 |
JDK-2205841 | 6u25 | Tony Printezis | P3 | Resolved | Fixed | b01 |
While testing another set of changes I got a few BOT-related assertion failures when running the dacapo pmd benchmark. They always seemed to happen while the BOT was being set up as part of a humongous region / object allocation. The assertions were checking that the BOT had been correctly set up and complained if they were detecting inconsistencies. I also noticed that this always happened shortly after a cleanup.
Added instrumentation proved that the region that we just allocated to satisfy a humongous allocation request (and whose BOT was found ot be inconsistent) had just been freed during the last cleanup pause.
I'm trying to prove this with added instrumentation but this is what I think the race that's causing this failure is. When we do the cleanup pause we have some update buffers with entries that point into regions that we are about to free (this is definitely the case; I've proven that with instrumentation; I'm still trying to prove that a failure follows this scenario and the regions involved are the same). When we allocate one of those regions to satisfy the humongous allocation request, the concurrent refinement thread might try to refine parts of said region (its top() is set to end() before the BOT is set up) and it might try to make some of the BOT entries more fine-grain and do so concurrently with the thread that's allocating the humongous regions. So, the BOT was becoming inconsistent not because the thread that set it up did so wrongly, but because the concurrent refinement thread messed it up concurrently.
Added instrumentation proved that the region that we just allocated to satisfy a humongous allocation request (and whose BOT was found ot be inconsistent) had just been freed during the last cleanup pause.
I'm trying to prove this with added instrumentation but this is what I think the race that's causing this failure is. When we do the cleanup pause we have some update buffers with entries that point into regions that we are about to free (this is definitely the case; I've proven that with instrumentation; I'm still trying to prove that a failure follows this scenario and the regions involved are the same). When we allocate one of those regions to satisfy the humongous allocation request, the concurrent refinement thread might try to refine parts of said region (its top() is set to end() before the BOT is set up) and it might try to make some of the BOT entries more fine-grain and do so concurrently with the thread that's allocating the humongous regions. So, the BOT was becoming inconsistent not because the thread that set it up did so wrongly, but because the concurrent refinement thread messed it up concurrently.
- backported by
-
JDK-2202394 G1: race between concurrent refinement and humongous object allocation
- Resolved
-
JDK-2205841 G1: race between concurrent refinement and humongous object allocation
- Resolved
- relates to
-
JDK-7010490 G1: BOT: join_blocks() is dead code
- Closed