Loading...

XML

Word

Printable

Type: Bug
Resolution: Fixed
Priority: P3
Fix Version/s: hs24
Affects Version/s: 7u6
Component/s: hotspot
Labels:

Subcomponent:
gc
Resolved In Build:
b21
CPU:

generic
OS:

generic
Verification:
Verified

Issue	Fix Version	Assignee	Priority	Status	Resolution	Resolved In Build
JDK-2229157	8	John Cuthbertson	P3	Resolved	Fixed	b54
JDK-8017959	7u45	John Cuthbertson	P3	Closed	Fixed	b01
JDK-8002792	7u40	John Cuthbertson	P3	Resolved	Fixed	b01

During the testing of the PermGen removal changes, the developement engineers started to see assertion failures and crashes in G1's implementation of the BlockOffsetTable (BOT).

Investigation indicated that these crashes:

* were the result of incorrect and inconsistent values in the BOT's offset array,
* always seemed to be happening on one particular platform type (that is part of the testing and integration infrastructure),
* seemed to have increased in frequency since the inclusion of the changes for 6818524 into the PermGen removal code.

Specifically the assert being seen was:

# Internal Error (/tmp/jprt/P1/173804.cphillim/s/src/share/vm/gc_implementation/g1/g1BlockOffsetTable.cpp:552), pid=5210, tid=27
# assert(_array->offset_array(j) > 0 && _array->offset_array(j) <= (u_char) (N_words+BlockOffsetArray::N_powers-1)) failed: offset array should have been set

The assert was extended to print out the values being compared:

> --- a/src/share/vm/gc_implementation/g1/g1BlockOffsetTable.cpp
> +++ b/src/share/vm/gc_implementation/g1/g1BlockOffsetTable.cpp
> @@ -546,7 +546,10 @@
> assert(_array->offset_array(j) > 0 &&
> _array->offset_array(j) <=
> (u_char) (N_words+BlockOffsetArray::N_powers-1),
> - "offset array should have been set");
> + err_msg("offset array should have been set "
> + SIZE_FORMAT " not > 0 OR " SIZE_FORMAT " not <= "
> + SIZE_FORMAT, _array->offset_array(j),
> _array->offset_array(j),
> + (N_words+BlockOffsetArray::N_powers-1)));
> }
> #endif
> }

which yielded the following:

# Internal Error (/tmp/jprt/P1/173804.cphillim/s/src/share/vm/gc_implementation/g1/g1BlockOffsetTable.cpp:552), pid=5210, tid=27
# assert(_array->offset_array(j) > 0 && _array->offset_array(j) <= (u_char) (N_words+BlockOffsetArray::N_powers-1)) failed: offset array should have been set 65 not > 0 OR 65 not <= 77

So in the above: the value in the offset array was printed as 65 but it failed a comparison that checks it is strictly greater than zero and no more than 77. So how could this assertion be failing with a value of 65?

An investigation the G1 BOT code and a comparison against the BOT for the other collectors indicated that G1 (as a result of the increased size of old-gen PLABS) might be running into the same issue as 6948537. Namely concurrent readers of the G1 BOT (concurrent refine threads) were seeing spurious zeros in BOT entries. The error message was printing 65 but 65 should have passed the check.

I believe that the issue described in 6948537 matches the behavior being seen in the failing assert. I also believe that resize able PLABs increases the likelihood of hitting the problem.

Prior to the resize able PLABs, the size of PLABs for old regions was set to 1Kb (note, in G1, we only refine cards in old regions concurrently) - which is a span of 2 cards. With resize able PLABs, this can increase. When we allocate a PLAB, we record its start in the BOT. Now suppose we allocate an old-gen PLAB that spans 10 cards and we see updates to card 10 and card 2, which end up in two different update buffers. Now suppose that a CR thread gets one buffer and starts to process card 10, it will cause the BOT to be updated from the start of the PLAB to the object spanned by card 10. Now let's suppose another CR thread gets the other buffer and starts to process card 2 while the BOT is being updated (or vice-versa) The issue reported by Ramki may come into play.

By resizing the PLABs, we potentially have more BOT refinement going on.

backported by

JDK-2229157 G1: Extend fix for 6948537 to G1's BOT

Resolved

JDK-8002792 G1: Extend fix for 6948537 to G1's BOT

Resolved

JDK-8002793 G1: Extend fix for 6948537 to G1's BOT

Closed

JDK-8002794 G1: Extend fix for 6948537 to G1's BOT

Closed

JDK-8017959 G1: Extend fix for 6948537 to G1's BOT

Closed

relates to

JDK-6948537 CMS: BOT walkers observe out-of-thin-air zeros on sun4v sparc/CMT

Closed

(1 relates to)

Assignee:: John Cuthbertson
Reporter:: John Cuthbertson
Votes:: 0 Vote for this issue
Watchers:: 2 Start watching this issue

Created:: 2012-08-16 14:52
Updated:: 2013-09-18 04:55
Resolved:: 2012-08-26 22:58
Imported:: 17/Sep/12 11:36 PM
Indexed:: 31/Aug/12 2:14 AM

Details

Backports

Description

Attachments

Issue Links

Activity

People

Dates