-
Enhancement
-
Resolution: Not an Issue
-
P4
-
21
This is a test that uses async exceptions and we seem to be getting the async exception at an inopportune place in the AQS code which then leads to a subsequent NPE (probably due to a finally block).
The strange thing is that this is only failing on aarch64 intermittently.
To elaborate on how this failure mode can arise. We are executing this code:
private void doAcquireSharedInterruptibly(int arg)
throws InterruptedException {
final Node node = addWaiter(Node.SHARED);
boolean failed = true;
try {
for (;;) {
final Node p = node.predecessor();
if (p == head) {
int r = tryAcquireShared(arg);
if (r >= 0) {
setHeadAndPropagate(node, r); <= node.prev is nulled in here
p.next = null; // help GC
failed = false;
return;
}
}
if (shouldParkAfterFailedAcquire(p, node) &&
parkAndCheckInterrupt())
throw new InterruptedException();
}
} finally {
if (failed)
cancelAcquire(node); <= NPE comes out of here L1002
}
}
The NPE comes out of cancelAcquire which is executed in the finally block. The code for that is:
private void cancelAcquire(Node node) {
// Ignore if node doesn't exist
if (node == null)
return;
node.thread = null;
// Skip cancelled predecessors
Node pred = node.prev;
while (pred.waitStatus > 0) <= NPE here
The NPE indicates that pred is NULL and so node.prev was null. If we then look back to the calling code, we can determine that prev is set to NULL in setHeadAndPropagate. So if the ThreadDeath hits after that field is set to NULL and before failed is set to false, then we execute the finally block and so get the NPE.
So the only"fix" here would be to add NPE to the allowed exceptions in the test.
The strange thing is that this is only failing on aarch64 intermittently.
To elaborate on how this failure mode can arise. We are executing this code:
private void doAcquireSharedInterruptibly(int arg)
throws InterruptedException {
final Node node = addWaiter(Node.SHARED);
boolean failed = true;
try {
for (;;) {
final Node p = node.predecessor();
if (p == head) {
int r = tryAcquireShared(arg);
if (r >= 0) {
setHeadAndPropagate(node, r); <= node.prev is nulled in here
p.next = null; // help GC
failed = false;
return;
}
}
if (shouldParkAfterFailedAcquire(p, node) &&
parkAndCheckInterrupt())
throw new InterruptedException();
}
} finally {
if (failed)
cancelAcquire(node); <= NPE comes out of here L1002
}
}
The NPE comes out of cancelAcquire which is executed in the finally block. The code for that is:
private void cancelAcquire(Node node) {
// Ignore if node doesn't exist
if (node == null)
return;
node.thread = null;
// Skip cancelled predecessors
Node pred = node.prev;
while (pred.waitStatus > 0) <= NPE here
The NPE indicates that pred is NULL and so node.prev was null. If we then look back to the calling code, we can determine that prev is set to NULL in setHeadAndPropagate. So if the ThreadDeath hits after that field is set to NULL and before failed is set to false, then we execute the finally block and so get the NPE.
So the only"fix" here would be to add NPE to the allowed exceptions in the test.