Several memory fences are needed in the concurrent mark sweep GC source code. This has caused intermittent crashes that's very hard to trace.
1. In ConcurrentMarkSweepGeneration::par_promote():
First in a call to alloc(), a free block is marked as not free by storing 0 to _prev field, which corresponds to the _klass field in an oop header. Next copy_words_aligned is called to fill in the other fields of the object. Last _klass field is set to the correct value. These three steps have to become visible exactly in this order.
But there is no code to force this ordering on a weakly ordered machine.
The fix is to add a call to membar() after the call to alloc(), then make the last store to _klass a volatile (release) store.
2. Corresponding to 1, in CompactibleFreeListSpace::block_is_obj():
The read from _klass needs to be changed to a volatile (acquire) load to make sure the last store to _klass in ConcurrentMarkSweepGeneration::par_promote() is observed only after all other stores to the promoted object.
3. The following comment
// Update BOT last so that other (parallel) GC threads see a consistent
// view of the BOT and free blocks.
// Above must occur before BOT is updated below.
is seen in CompactibleFreeListSpace::getChunkFromLinearAllocBlock(), CompactibleFreeListSpace::getChunkFromLinearAllocBlockRemainder(), CompactibleFreeListSpace::splitChunkAndReturnRemainder(), and CompactibleFreeListSpace::par_get_chunk_of_blocks(). But there is no code that forces the order on a weakly ordered machine. A membar is needed for each instance.
more missing fences
inline bool OopTaskQueue::push(Task t) {
...
_elems[localBot] = (TskET*) t;
Atomic::write_barrier(); // MISSING BARRIER HERE <-- NOT IN SUN CODE????
_bottom = increment_index(localBot);|
...
|}
template<class E> class TaskQueue : public GenericTaskQueueSuper {
// Slow paths for push, pop_local. (pop_global has no fast path.)
bool push_slow(E t, juint dirty_n_elems) {
if (dirty_n_elems == n() - 1) {
// Actually means 0, so do the push.
juint localBot = _bottom;
_elems[localBot] = (TskET*) t;
Atomic::write_barrier(); // Yet Another Missing Barrier in SUN
CODE????
_bottom = increment_index(localBot);
return true;
} else
return false;
}
bool OopTaskQueue::pop_global(Task& t) {
Age newAge;
Age oldAge = get_age();
Atomic::membar(); // Yet Another Missing Barrier in SUN CODE???
juint localBot = _bottom;
juint n_elems = size(localBot, oldAge.top());
...
bool GenericTaskQueueSuper::
pop_local_slow(juint localBot, Age oldAge) {
...
set_age(newAge);
Atomic::membar(); // Yet Another Missing Barrier in SUN CODE???
assert(dirty_size(localBot, get_top()) != n() - 1,
"Shouldn't be possible...");
...
1. In ConcurrentMarkSweepGeneration::par_promote():
First in a call to alloc(), a free block is marked as not free by storing 0 to _prev field, which corresponds to the _klass field in an oop header. Next copy_words_aligned is called to fill in the other fields of the object. Last _klass field is set to the correct value. These three steps have to become visible exactly in this order.
But there is no code to force this ordering on a weakly ordered machine.
The fix is to add a call to membar() after the call to alloc(), then make the last store to _klass a volatile (release) store.
2. Corresponding to 1, in CompactibleFreeListSpace::block_is_obj():
The read from _klass needs to be changed to a volatile (acquire) load to make sure the last store to _klass in ConcurrentMarkSweepGeneration::par_promote() is observed only after all other stores to the promoted object.
3. The following comment
// Update BOT last so that other (parallel) GC threads see a consistent
// view of the BOT and free blocks.
// Above must occur before BOT is updated below.
is seen in CompactibleFreeListSpace::getChunkFromLinearAllocBlock(), CompactibleFreeListSpace::getChunkFromLinearAllocBlockRemainder(), CompactibleFreeListSpace::splitChunkAndReturnRemainder(), and CompactibleFreeListSpace::par_get_chunk_of_blocks(). But there is no code that forces the order on a weakly ordered machine. A membar is needed for each instance.
more missing fences
inline bool OopTaskQueue::push(Task t) {
...
_elems[localBot] = (TskET*) t;
Atomic::write_barrier(); // MISSING BARRIER HERE <-- NOT IN SUN CODE????
_bottom = increment_index(localBot);|
...
|}
template<class E> class TaskQueue : public GenericTaskQueueSuper {
// Slow paths for push, pop_local. (pop_global has no fast path.)
bool push_slow(E t, juint dirty_n_elems) {
if (dirty_n_elems == n() - 1) {
// Actually means 0, so do the push.
juint localBot = _bottom;
_elems[localBot] = (TskET*) t;
Atomic::write_barrier(); // Yet Another Missing Barrier in SUN
CODE????
_bottom = increment_index(localBot);
return true;
} else
return false;
}
bool OopTaskQueue::pop_global(Task& t) {
Age newAge;
Age oldAge = get_age();
Atomic::membar(); // Yet Another Missing Barrier in SUN CODE???
juint localBot = _bottom;
juint n_elems = size(localBot, oldAge.top());
...
bool GenericTaskQueueSuper::
pop_local_slow(juint localBot, Age oldAge) {
...
set_age(newAge);
Atomic::membar(); // Yet Another Missing Barrier in SUN CODE???
assert(dirty_size(localBot, get_top()) != n() - 1,
"Shouldn't be possible...");
...
- duplicates
-
JDK-6858243 CMS: Fencing needed for hw with weak memory model
-
- Closed
-
- relates to
-
JDK-8006971 More missing barriers in taskqueues for RMO architectures
-
- Closed
-
-
JDK-6888847 TaskQueue needs release_store() for correctness on RMO machines
-
- Resolved
-