It came initially out of some problems
Peter Kessler found in HotSpot's clone code. This led me to
review EVM's code where I found that we were violating a
hand-shake concurrent collectors were expecting (setting
nearClass to NULL initially, and only setting nearClass
to appropriate value after the rest of the object is initialized).
To make sure that would work, I also checked the three cases
where concurrent activity looks at objects (marking, sweeping,
and concurrent refinement). I quickly realised there was a
simple race between the concurrent refinement thread and
the allocating of memory in the CMS generation. Then, I couldn't
see how refinement is blocked during sweeping and I started
noticing more apparent races. :-( I've got scenarios now that
either lead to walking corrupt objects or directly summarizing
bad reference locations if sweeping is allowed while performing
concurrent refinement. :-( I don't think this happens in practice
because most problematic allocations occur in the young generation
and not in the older generation; however, under low memory situations
and under rare occurances of large-object allocation in the CMS
generation, these races appear to be present.
-- Alex
Peter Kessler found in HotSpot's clone code. This led me to
review EVM's code where I found that we were violating a
hand-shake concurrent collectors were expecting (setting
nearClass to NULL initially, and only setting nearClass
to appropriate value after the rest of the object is initialized).
To make sure that would work, I also checked the three cases
where concurrent activity looks at objects (marking, sweeping,
and concurrent refinement). I quickly realised there was a
simple race between the concurrent refinement thread and
the allocating of memory in the CMS generation. Then, I couldn't
see how refinement is blocked during sweeping and I started
noticing more apparent races. :-( I've got scenarios now that
either lead to walking corrupt objects or directly summarizing
bad reference locations if sweeping is allowed while performing
concurrent refinement. :-( I don't think this happens in practice
because most problematic allocations occur in the young generation
and not in the older generation; however, under low memory situations
and under rare occurances of large-object allocation in the CMS
generation, these races appear to be present.
-- Alex
- relates to
-
JDK-4840070 hotspot crash in copy_to_survivor_space method
-
- Closed
-