-
Bug
-
Resolution: Fixed
-
P2
-
11, 13
-
Linux-x64, Windows-x64
-
b22
-
generic
-
generic
-
Not verified
Issue | Fix Version | Assignee | Priority | Status | Resolution | Resolved In Build |
---|---|---|---|---|---|---|
JDK-8228206 | 11.0.6-oracle | Poonam Bajaj Parhar | P2 | Resolved | Fixed | b01 |
JDK-8224923 | 11.0.5-oracle | Poonam Bajaj Parhar | P2 | Resolved | Fixed | b02 |
JDK-8225417 | 11.0.5 | Markus Grönlund | P2 | Resolved | Fixed | b01 |
JDK-8225001 | 11.0.4-oracle | Poonam Bajaj Parhar | P2 | Closed | Fixed | b07 |
JDK-8225241 | 11.0.4 | Markus Grönlund | P2 | Resolved | Fixed | b06 |
JDK-8248508 | openjdk8u272 | Andrey Petushkov | P2 | Resolved | Fixed | b01 |
JDK-8247742 | openjdk8u262 | Andrey Petushkov | P2 | Resolved | Fixed | b08 |
Hi,
there is a deadlock triggered by the JFR StringPool. The problem starts when the jfr_add_string_constant method tries to write a string and the called WriterHost::ensure_size() method find not enough memory in the current buffer. This leads to JfrStringPool::lease_buffer() being called indirectly, which then tries to get a suitable buffer from the memory space via mspace_get_free_lease_with_retry(). This will eventually try to get an already available buffer from the memory space via JfrMspaceRetrieval::get(size,..), which looks like this:
static Type* get(size_t size, Mspace* mspace, typename Mspace::Iterator& iterator, Thread* thread) {
while (iterator.has_next()) {
Type* const t = iterator.next();
if (t->retired()) continue;
if (t->try_acquire(thread)) {
assert(!t->retired(), "invariant");
if (t->free_size() >= size) {
return t;
}
t->set_retired();
mspace->register_full(t, thread);
}
}
return NULL;
When it finds a buffer in the list it does a try-lock on the buffer. Let's assume this works. Then it checks if the locked buffer can accomidate the data we want to write. If not, the buffer is set to retired and the register_full() method of the memory space is called. In this case it is the method JfrStringPool::register_full(), which looks like this:
void JfrStringPool::register_full(BufferPtr t, Thread* thread) {
// nothing here at the moment
assert(t->retired(), "invariant");
}
So the buffer we found is still locked. Later in the 'JFR Recorder Thread' a rotation is triggered. This causes JfrRecorderService::pre_safepoint_write() to eventually call the write StringPoolWriteOp::write() method of the locked buffer. This method first tries to get the lock of the buffer via acquire(), but this call will obviously never return.
Since the thread now hangs forever, the next safepoint request will block forever too. At this point the VM is more or less dead.
I could remove the deadlock by adding 't->release();' to the JfrStringPool::register_full() method, but since I don't have the best overview of the whole buffer handling, I'm not sure this is the correct fix.
You can reproduce the problem with the following program:
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
public class FileIoStresser implements Runnable {
public static void main(String[] args) {
for (int i = 0; i < 3; ++i) {
new Thread(new FileIoStresser()).start();
}
}
@Override
public void run() {
while (true) {
StringBuilder sb = new StringBuilder();
for (int i = (int) (Math.random() * 40 + 16); i >= 0; --i) {
sb.append((char) ('a' + Math.random() * 26));
}
File f = new File("aaaa" + sb.toString());
try (FileOutputStream fos = new FileOutputStream(f)) {
fos.write(1);
fos.write(2);
} catch (IOException e) {
// Ignore.
}
f.delete();
}
}
}
This program creates small files in the current directory and deletes them directly (they all start with 'aaaa'). The sizes of the filenames is choosen, so that the string pool will be hit often. When you profile it with JFR and enable the File I/O Trace with a threshold of 0, the deadlock should occur in a few minutes. You can add -Xlog:safepoint=debug to see when the safepoint is requested but never finishes. And if you make the chunk size small via -XX:FlightRecorderOptions=maxchunksize=1M, the deadlock occurs faster.
Best regards,
Ralf
Here is an actual stack trace when the JfrStringPool::register_full() method is called:
JfrStringPool::register_full(JfrStringPoolBuffer * t, Thread * thread) Line 228
JfrMemorySpace<JfrStringPoolBuffer,JfrMspaceSequentialRetrieval,JfrStringPool>::register_full(JfrStringPoolBuffer * t, Thread * thread) Line 94
JfrMspaceRetrieval<JfrMemorySpace<JfrStringPoolBuffer,JfrMspaceSequentialRetrieval,JfrStringPool> >::get(unsigned __int64 size, JfrMemorySpace<JfrStringPoolBuffer,JfrMspaceSequentialRetrieval,JfrStringPool> * mspace, StopOnNullIterator<JfrDoublyLinkedList<JfrStringPoolBuffer>,StackObj> & iterator, Thread * thread) Line 56
JfrMspaceSequentialRetrieval<JfrMemorySpace<JfrStringPoolBuffer,JfrMspaceSequentialRetrieval,JfrStringPool> >::get(unsigned __int64 size, JfrMemorySpace<JfrStringPoolBuffer,JfrMspaceSequentialRetrieval,JfrStringPool> * mspace, Thread * thread) Line 84
JfrMemorySpace<JfrStringPoolBuffer,JfrMspaceSequentialRetrieval,JfrStringPool>::get(unsigned __int64 size, Thread * thread) Line 101
mspace_get_free<JfrMemorySpace<JfrStringPoolBuffer,JfrMspaceSequentialRetrieval,JfrStringPool> >(unsigned __int64 size, JfrMemorySpace<JfrStringPoolBuffer,JfrMspaceSequentialRetrieval,JfrStringPool> * mspace, Thread * thread) Line 212 C++
mspace_get_free_with_retry<JfrMemorySpace<JfrStringPoolBuffer,JfrMspaceSequentialRetrieval,JfrStringPool> >(unsigned __int64 size, JfrMemorySpace<JfrStringPoolBuffer,JfrMspaceSequentialRetrieval,JfrStringPool> * mspace, unsigned __int64 retry_count, Thread * thread) Line 218
mspace_get_free_lease_with_retry<JfrMemorySpace<JfrStringPoolBuffer,JfrMspaceSequentialRetrieval,JfrStringPool> >(unsigned __int64 size, JfrMemorySpace<JfrStringPoolBuffer,JfrMspaceSequentialRetrieval,JfrStringPool> * mspace, unsigned __int64 retry_count, Thread * thread) Line 263
JfrStringPool::lease_buffer(Thread * thread, unsigned __int64 size) Line 122
JfrStringPool::flush(JfrStringPoolBuffer * old, unsigned __int64 used, unsigned __int64 requested, Thread * thread) Line 111
JfrStringPoolFlush::JfrStringPoolFlush(JfrStringPoolBuffer * old, unsigned __int64 used, unsigned __int64 requested, Thread * t) Line 32
Adapter<JfrStringPoolFlush>::flush(unsigned __int64 used, unsigned __int64 requested) Line 84
StorageHost<Adapter<JfrStringPoolFlush>,StackObj>::accommodate(unsigned __int64 used, unsigned __int64 requested) Line 66
WriterHost<EncoderHost<BigEndianEncoderImpl,BigEndianEncoderImpl>,EncoderHost<Varint128EncoderImpl,BigEndianEncoderImpl>,AcquireReleaseMemoryWriterHost<Adapter<JfrStringPoolFlush>,StackObj> >::ensure_size(unsigned __int64 requested) Line 172
WriterHost<EncoderHost<BigEndianEncoderImpl,BigEndianEncoderImpl>,EncoderHost<Varint128EncoderImpl,BigEndianEncoderImpl>,AcquireReleaseMemoryWriterHost<Adapter<JfrStringPoolFlush>,StackObj> >::be_write<signed char>(const char * value, unsigned __int64 len) Line 128
WriterHost<EncoderHost<BigEndianEncoderImpl,BigEndianEncoderImpl>,EncoderHost<Varint128EncoderImpl,BigEndianEncoderImpl>,AcquireReleaseMemoryWriterHost<Adapter<JfrStringPoolFlush>,StackObj> >::_write(_jstring * string) Line 233
JfrStringPool::add(bool epoch, __int64 id, _jstring * string, JavaThread * jt) Line 138
jfr_add_string_constant(JNIEnv_ * env, _jclass * jvm, unsigned char epoch, __int64 id, _jstring * string) Line 304
And here is a stack trace for the stuck recorder thread:
OrderAccess::load_acquire<CompiledMethod *>(CompiledMethod * const volatile * p) Line 338
JfrBuffer::acquire(const void * id) Line 130
StringPoolWriteOp::_write(JfrStringPoolBuffer * buffer, const unsigned char * data, unsigned __int64 size) Line 154
ConcurrentWriteOp<StringPoolWriteOp>::process(JfrStringPoolBuffer * t) Line 51
JfrMemorySpace<JfrStringPoolBuffer,JfrMspaceSequentialRetrieval,JfrStringPool>::iterate<ConcurrentWriteOp<StringPoolWriteOp>,StopOnNullIterator<JfrDoublyLinkedList<JfrStringPoolBuffer>,StackObj> >(ConcurrentWriteOp<StringPoolWriteOp> & callback, bool full, jfr_iter_direction direction) Line 122
process_free_list_iterator_control<ConcurrentWriteOp<StringPoolWriteOp>,JfrMemorySpace<JfrStringPoolBuffer,JfrMspaceSequentialRetrieval,JfrStringPool>,StopOnNullIterator<JfrDoublyLinkedList<JfrStringPoolBuffer>,StackObj> >(ConcurrentWriteOp<StringPoolWriteOp> & processor, JfrMemorySpace<JfrStringPoolBuffer,JfrMspaceSequentialRetrieval,JfrStringPool> * mspace, jfr_iter_direction direction) Line 325
process_free_list<ConcurrentWriteOp<StringPoolWriteOp>,JfrMemorySpace<JfrStringPoolBuffer,JfrMspaceSequentialRetrieval,JfrStringPool> >(ConcurrentWriteOp<StringPoolWriteOp> & processor, JfrMemorySpace<JfrStringPoolBuffer,JfrMspaceSequentialRetrieval,JfrStringPool> * mspace, jfr_iter_direction direction) Line 344
JfrStringPool::_write() Line 175
ServiceFunctor<JfrStringPool,&JfrStringPool::_write>::process() Line 192
WriteCheckpointEvent<ServiceFunctor<JfrStringPool,&JfrStringPool::_write> >::process() Line 168
write_stringpool_checkpoint(JfrStringPool & string_pool, JfrChunkWriter & chunkwriter) Line 405
JfrRecorderService::pre_safepoint_write() Line 431
JfrRecorderService::_write() Line 385
JfrRecorderService::finalize_current_chunk() Line 378
JfrRecorderService::chunk_rotation() Line 372
JfrRecorderService::rotate(int msgs) Line 331
recorderthread_entry(JavaThread * thread, Thread * unused) Line 76
JavaThread::thread_main_inner() Line 1870
JavaThread::run() Line 1853
Thread::call_run() Line 405
thread_native_entry(Thread * thread) Line 470
there is a deadlock triggered by the JFR StringPool. The problem starts when the jfr_add_string_constant method tries to write a string and the called WriterHost::ensure_size() method find not enough memory in the current buffer. This leads to JfrStringPool::lease_buffer() being called indirectly, which then tries to get a suitable buffer from the memory space via mspace_get_free_lease_with_retry(). This will eventually try to get an already available buffer from the memory space via JfrMspaceRetrieval::get(size,..), which looks like this:
static Type* get(size_t size, Mspace* mspace, typename Mspace::Iterator& iterator, Thread* thread) {
while (iterator.has_next()) {
Type* const t = iterator.next();
if (t->retired()) continue;
if (t->try_acquire(thread)) {
assert(!t->retired(), "invariant");
if (t->free_size() >= size) {
return t;
}
t->set_retired();
mspace->register_full(t, thread);
}
}
return NULL;
When it finds a buffer in the list it does a try-lock on the buffer. Let's assume this works. Then it checks if the locked buffer can accomidate the data we want to write. If not, the buffer is set to retired and the register_full() method of the memory space is called. In this case it is the method JfrStringPool::register_full(), which looks like this:
void JfrStringPool::register_full(BufferPtr t, Thread* thread) {
// nothing here at the moment
assert(t->retired(), "invariant");
}
So the buffer we found is still locked. Later in the 'JFR Recorder Thread' a rotation is triggered. This causes JfrRecorderService::pre_safepoint_write() to eventually call the write StringPoolWriteOp::write() method of the locked buffer. This method first tries to get the lock of the buffer via acquire(), but this call will obviously never return.
Since the thread now hangs forever, the next safepoint request will block forever too. At this point the VM is more or less dead.
I could remove the deadlock by adding 't->release();' to the JfrStringPool::register_full() method, but since I don't have the best overview of the whole buffer handling, I'm not sure this is the correct fix.
You can reproduce the problem with the following program:
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
public class FileIoStresser implements Runnable {
public static void main(String[] args) {
for (int i = 0; i < 3; ++i) {
new Thread(new FileIoStresser()).start();
}
}
@Override
public void run() {
while (true) {
StringBuilder sb = new StringBuilder();
for (int i = (int) (Math.random() * 40 + 16); i >= 0; --i) {
sb.append((char) ('a' + Math.random() * 26));
}
File f = new File("aaaa" + sb.toString());
try (FileOutputStream fos = new FileOutputStream(f)) {
fos.write(1);
fos.write(2);
} catch (IOException e) {
// Ignore.
}
f.delete();
}
}
}
This program creates small files in the current directory and deletes them directly (they all start with 'aaaa'). The sizes of the filenames is choosen, so that the string pool will be hit often. When you profile it with JFR and enable the File I/O Trace with a threshold of 0, the deadlock should occur in a few minutes. You can add -Xlog:safepoint=debug to see when the safepoint is requested but never finishes. And if you make the chunk size small via -XX:FlightRecorderOptions=maxchunksize=1M, the deadlock occurs faster.
Best regards,
Ralf
Here is an actual stack trace when the JfrStringPool::register_full() method is called:
JfrStringPool::register_full(JfrStringPoolBuffer * t, Thread * thread) Line 228
JfrMemorySpace<JfrStringPoolBuffer,JfrMspaceSequentialRetrieval,JfrStringPool>::register_full(JfrStringPoolBuffer * t, Thread * thread) Line 94
JfrMspaceRetrieval<JfrMemorySpace<JfrStringPoolBuffer,JfrMspaceSequentialRetrieval,JfrStringPool> >::get(unsigned __int64 size, JfrMemorySpace<JfrStringPoolBuffer,JfrMspaceSequentialRetrieval,JfrStringPool> * mspace, StopOnNullIterator<JfrDoublyLinkedList<JfrStringPoolBuffer>,StackObj> & iterator, Thread * thread) Line 56
JfrMspaceSequentialRetrieval<JfrMemorySpace<JfrStringPoolBuffer,JfrMspaceSequentialRetrieval,JfrStringPool> >::get(unsigned __int64 size, JfrMemorySpace<JfrStringPoolBuffer,JfrMspaceSequentialRetrieval,JfrStringPool> * mspace, Thread * thread) Line 84
JfrMemorySpace<JfrStringPoolBuffer,JfrMspaceSequentialRetrieval,JfrStringPool>::get(unsigned __int64 size, Thread * thread) Line 101
mspace_get_free<JfrMemorySpace<JfrStringPoolBuffer,JfrMspaceSequentialRetrieval,JfrStringPool> >(unsigned __int64 size, JfrMemorySpace<JfrStringPoolBuffer,JfrMspaceSequentialRetrieval,JfrStringPool> * mspace, Thread * thread) Line 212 C++
mspace_get_free_with_retry<JfrMemorySpace<JfrStringPoolBuffer,JfrMspaceSequentialRetrieval,JfrStringPool> >(unsigned __int64 size, JfrMemorySpace<JfrStringPoolBuffer,JfrMspaceSequentialRetrieval,JfrStringPool> * mspace, unsigned __int64 retry_count, Thread * thread) Line 218
mspace_get_free_lease_with_retry<JfrMemorySpace<JfrStringPoolBuffer,JfrMspaceSequentialRetrieval,JfrStringPool> >(unsigned __int64 size, JfrMemorySpace<JfrStringPoolBuffer,JfrMspaceSequentialRetrieval,JfrStringPool> * mspace, unsigned __int64 retry_count, Thread * thread) Line 263
JfrStringPool::lease_buffer(Thread * thread, unsigned __int64 size) Line 122
JfrStringPool::flush(JfrStringPoolBuffer * old, unsigned __int64 used, unsigned __int64 requested, Thread * thread) Line 111
JfrStringPoolFlush::JfrStringPoolFlush(JfrStringPoolBuffer * old, unsigned __int64 used, unsigned __int64 requested, Thread * t) Line 32
Adapter<JfrStringPoolFlush>::flush(unsigned __int64 used, unsigned __int64 requested) Line 84
StorageHost<Adapter<JfrStringPoolFlush>,StackObj>::accommodate(unsigned __int64 used, unsigned __int64 requested) Line 66
WriterHost<EncoderHost<BigEndianEncoderImpl,BigEndianEncoderImpl>,EncoderHost<Varint128EncoderImpl,BigEndianEncoderImpl>,AcquireReleaseMemoryWriterHost<Adapter<JfrStringPoolFlush>,StackObj> >::ensure_size(unsigned __int64 requested) Line 172
WriterHost<EncoderHost<BigEndianEncoderImpl,BigEndianEncoderImpl>,EncoderHost<Varint128EncoderImpl,BigEndianEncoderImpl>,AcquireReleaseMemoryWriterHost<Adapter<JfrStringPoolFlush>,StackObj> >::be_write<signed char>(const char * value, unsigned __int64 len) Line 128
WriterHost<EncoderHost<BigEndianEncoderImpl,BigEndianEncoderImpl>,EncoderHost<Varint128EncoderImpl,BigEndianEncoderImpl>,AcquireReleaseMemoryWriterHost<Adapter<JfrStringPoolFlush>,StackObj> >::_write(_jstring * string) Line 233
JfrStringPool::add(bool epoch, __int64 id, _jstring * string, JavaThread * jt) Line 138
jfr_add_string_constant(JNIEnv_ * env, _jclass * jvm, unsigned char epoch, __int64 id, _jstring * string) Line 304
And here is a stack trace for the stuck recorder thread:
OrderAccess::load_acquire<CompiledMethod *>(CompiledMethod * const volatile * p) Line 338
JfrBuffer::acquire(const void * id) Line 130
StringPoolWriteOp::_write(JfrStringPoolBuffer * buffer, const unsigned char * data, unsigned __int64 size) Line 154
ConcurrentWriteOp<StringPoolWriteOp>::process(JfrStringPoolBuffer * t) Line 51
JfrMemorySpace<JfrStringPoolBuffer,JfrMspaceSequentialRetrieval,JfrStringPool>::iterate<ConcurrentWriteOp<StringPoolWriteOp>,StopOnNullIterator<JfrDoublyLinkedList<JfrStringPoolBuffer>,StackObj> >(ConcurrentWriteOp<StringPoolWriteOp> & callback, bool full, jfr_iter_direction direction) Line 122
process_free_list_iterator_control<ConcurrentWriteOp<StringPoolWriteOp>,JfrMemorySpace<JfrStringPoolBuffer,JfrMspaceSequentialRetrieval,JfrStringPool>,StopOnNullIterator<JfrDoublyLinkedList<JfrStringPoolBuffer>,StackObj> >(ConcurrentWriteOp<StringPoolWriteOp> & processor, JfrMemorySpace<JfrStringPoolBuffer,JfrMspaceSequentialRetrieval,JfrStringPool> * mspace, jfr_iter_direction direction) Line 325
process_free_list<ConcurrentWriteOp<StringPoolWriteOp>,JfrMemorySpace<JfrStringPoolBuffer,JfrMspaceSequentialRetrieval,JfrStringPool> >(ConcurrentWriteOp<StringPoolWriteOp> & processor, JfrMemorySpace<JfrStringPoolBuffer,JfrMspaceSequentialRetrieval,JfrStringPool> * mspace, jfr_iter_direction direction) Line 344
JfrStringPool::_write() Line 175
ServiceFunctor<JfrStringPool,&JfrStringPool::_write>::process() Line 192
WriteCheckpointEvent<ServiceFunctor<JfrStringPool,&JfrStringPool::_write> >::process() Line 168
write_stringpool_checkpoint(JfrStringPool & string_pool, JfrChunkWriter & chunkwriter) Line 405
JfrRecorderService::pre_safepoint_write() Line 431
JfrRecorderService::_write() Line 385
JfrRecorderService::finalize_current_chunk() Line 378
JfrRecorderService::chunk_rotation() Line 372
JfrRecorderService::rotate(int msgs) Line 331
recorderthread_entry(JavaThread * thread, Thread * unused) Line 76
JavaThread::thread_main_inner() Line 1870
JavaThread::run() Line 1853
Thread::call_run() Line 405
thread_native_entry(Thread * thread) Line 470
- backported by
-
JDK-8224923 Deadlock in JFR string pool
- Resolved
-
JDK-8225241 Deadlock in JFR string pool
- Resolved
-
JDK-8225417 Deadlock in JFR string pool
- Resolved
-
JDK-8228206 Deadlock in JFR string pool
- Resolved
-
JDK-8247742 Deadlock in JFR string pool
- Resolved
-
JDK-8248508 Deadlock in JFR string pool
- Resolved
-
JDK-8225001 Deadlock in JFR string pool
- Closed
- blocks
-
JDK-8224139 Deprecate -XX:FlightRecorder option
- Resolved
(2 backported by, 1 blocks)