Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8220293

Deadlock in JFR string pool

XMLWordPrintable

    • jfr
    • b22
    • generic
    • generic
    • Not verified

        Hi,

        there is a deadlock triggered by the JFR StringPool. The problem starts when the jfr_add_string_constant method tries to write a string and the called WriterHost::ensure_size() method find not enough memory in the current buffer. This leads to JfrStringPool::lease_buffer() being called indirectly, which then tries to get a suitable buffer from the memory space via mspace_get_free_lease_with_retry(). This will eventually try to get an already available buffer from the memory space via JfrMspaceRetrieval::get(size,..), which looks like this:

          static Type* get(size_t size, Mspace* mspace, typename Mspace::Iterator& iterator, Thread* thread) {
            while (iterator.has_next()) {
              Type* const t = iterator.next();
              if (t->retired()) continue;
              if (t->try_acquire(thread)) {
                assert(!t->retired(), "invariant");
                if (t->free_size() >= size) {
                  return t;
                }
                t->set_retired();
                mspace->register_full(t, thread);
              }
            }
            return NULL;

        When it finds a buffer in the list it does a try-lock on the buffer. Let's assume this works. Then it checks if the locked buffer can accomidate the data we want to write. If not, the buffer is set to retired and the register_full() method of the memory space is called. In this case it is the method JfrStringPool::register_full(), which looks like this:

        void JfrStringPool::register_full(BufferPtr t, Thread* thread) {
          // nothing here at the moment
          assert(t->retired(), "invariant");
        }

        So the buffer we found is still locked. Later in the 'JFR Recorder Thread' a rotation is triggered. This causes JfrRecorderService::pre_safepoint_write() to eventually call the write StringPoolWriteOp::write() method of the locked buffer. This method first tries to get the lock of the buffer via acquire(), but this call will obviously never return.

        Since the thread now hangs forever, the next safepoint request will block forever too. At this point the VM is more or less dead.

        I could remove the deadlock by adding 't->release();' to the JfrStringPool::register_full() method, but since I don't have the best overview of the whole buffer handling, I'm not sure this is the correct fix.


        You can reproduce the problem with the following program:

        import java.io.File;
        import java.io.FileOutputStream;
        import java.io.IOException;

        public class FileIoStresser implements Runnable {

            public static void main(String[] args) {
                for (int i = 0; i < 3; ++i) {
                    new Thread(new FileIoStresser()).start();
                }
            }

            @Override
            public void run() {
                while (true) {
                    StringBuilder sb = new StringBuilder();

                    for (int i = (int) (Math.random() * 40 + 16); i >= 0; --i) {
                        sb.append((char) ('a' + Math.random() * 26));
                    }

                    File f = new File("aaaa" + sb.toString());

                    try (FileOutputStream fos = new FileOutputStream(f)) {
                        fos.write(1);
                        fos.write(2);
                    } catch (IOException e) {
                        // Ignore.
                    }

                    f.delete();
                }
            }
        }

        This program creates small files in the current directory and deletes them directly (they all start with 'aaaa'). The sizes of the filenames is choosen, so that the string pool will be hit often. When you profile it with JFR and enable the File I/O Trace with a threshold of 0, the deadlock should occur in a few minutes. You can add -Xlog:safepoint=debug to see when the safepoint is requested but never finishes. And if you make the chunk size small via -XX:FlightRecorderOptions=maxchunksize=1M, the deadlock occurs faster.


        Best regards,
        Ralf





        Here is an actual stack trace when the JfrStringPool::register_full() method is called:

        JfrStringPool::register_full(JfrStringPoolBuffer * t, Thread * thread) Line 228
        JfrMemorySpace<JfrStringPoolBuffer,JfrMspaceSequentialRetrieval,JfrStringPool>::register_full(JfrStringPoolBuffer * t, Thread * thread) Line 94
        JfrMspaceRetrieval<JfrMemorySpace<JfrStringPoolBuffer,JfrMspaceSequentialRetrieval,JfrStringPool> >::get(unsigned __int64 size, JfrMemorySpace<JfrStringPoolBuffer,JfrMspaceSequentialRetrieval,JfrStringPool> * mspace, StopOnNullIterator<JfrDoublyLinkedList<JfrStringPoolBuffer>,StackObj> & iterator, Thread * thread) Line 56
        JfrMspaceSequentialRetrieval<JfrMemorySpace<JfrStringPoolBuffer,JfrMspaceSequentialRetrieval,JfrStringPool> >::get(unsigned __int64 size, JfrMemorySpace<JfrStringPoolBuffer,JfrMspaceSequentialRetrieval,JfrStringPool> * mspace, Thread * thread) Line 84
        JfrMemorySpace<JfrStringPoolBuffer,JfrMspaceSequentialRetrieval,JfrStringPool>::get(unsigned __int64 size, Thread * thread) Line 101
        mspace_get_free<JfrMemorySpace<JfrStringPoolBuffer,JfrMspaceSequentialRetrieval,JfrStringPool> >(unsigned __int64 size, JfrMemorySpace<JfrStringPoolBuffer,JfrMspaceSequentialRetrieval,JfrStringPool> * mspace, Thread * thread) Line 212 C++
        mspace_get_free_with_retry<JfrMemorySpace<JfrStringPoolBuffer,JfrMspaceSequentialRetrieval,JfrStringPool> >(unsigned __int64 size, JfrMemorySpace<JfrStringPoolBuffer,JfrMspaceSequentialRetrieval,JfrStringPool> * mspace, unsigned __int64 retry_count, Thread * thread) Line 218
        mspace_get_free_lease_with_retry<JfrMemorySpace<JfrStringPoolBuffer,JfrMspaceSequentialRetrieval,JfrStringPool> >(unsigned __int64 size, JfrMemorySpace<JfrStringPoolBuffer,JfrMspaceSequentialRetrieval,JfrStringPool> * mspace, unsigned __int64 retry_count, Thread * thread) Line 263
        JfrStringPool::lease_buffer(Thread * thread, unsigned __int64 size) Line 122
        JfrStringPool::flush(JfrStringPoolBuffer * old, unsigned __int64 used, unsigned __int64 requested, Thread * thread) Line 111
        JfrStringPoolFlush::JfrStringPoolFlush(JfrStringPoolBuffer * old, unsigned __int64 used, unsigned __int64 requested, Thread * t) Line 32
        Adapter<JfrStringPoolFlush>::flush(unsigned __int64 used, unsigned __int64 requested) Line 84
        StorageHost<Adapter<JfrStringPoolFlush>,StackObj>::accommodate(unsigned __int64 used, unsigned __int64 requested) Line 66
        WriterHost<EncoderHost<BigEndianEncoderImpl,BigEndianEncoderImpl>,EncoderHost<Varint128EncoderImpl,BigEndianEncoderImpl>,AcquireReleaseMemoryWriterHost<Adapter<JfrStringPoolFlush>,StackObj> >::ensure_size(unsigned __int64 requested) Line 172
        WriterHost<EncoderHost<BigEndianEncoderImpl,BigEndianEncoderImpl>,EncoderHost<Varint128EncoderImpl,BigEndianEncoderImpl>,AcquireReleaseMemoryWriterHost<Adapter<JfrStringPoolFlush>,StackObj> >::be_write<signed char>(const char * value, unsigned __int64 len) Line 128
        WriterHost<EncoderHost<BigEndianEncoderImpl,BigEndianEncoderImpl>,EncoderHost<Varint128EncoderImpl,BigEndianEncoderImpl>,AcquireReleaseMemoryWriterHost<Adapter<JfrStringPoolFlush>,StackObj> >::_write(_jstring * string) Line 233
        JfrStringPool::add(bool epoch, __int64 id, _jstring * string, JavaThread * jt) Line 138
        jfr_add_string_constant(JNIEnv_ * env, _jclass * jvm, unsigned char epoch, __int64 id, _jstring * string) Line 304


        And here is a stack trace for the stuck recorder thread:
        OrderAccess::load_acquire<CompiledMethod *>(CompiledMethod * const volatile * p) Line 338
        JfrBuffer::acquire(const void * id) Line 130
        StringPoolWriteOp::_write(JfrStringPoolBuffer * buffer, const unsigned char * data, unsigned __int64 size) Line 154
        ConcurrentWriteOp<StringPoolWriteOp>::process(JfrStringPoolBuffer * t) Line 51
        JfrMemorySpace<JfrStringPoolBuffer,JfrMspaceSequentialRetrieval,JfrStringPool>::iterate<ConcurrentWriteOp<StringPoolWriteOp>,StopOnNullIterator<JfrDoublyLinkedList<JfrStringPoolBuffer>,StackObj> >(ConcurrentWriteOp<StringPoolWriteOp> & callback, bool full, jfr_iter_direction direction) Line 122
        process_free_list_iterator_control<ConcurrentWriteOp<StringPoolWriteOp>,JfrMemorySpace<JfrStringPoolBuffer,JfrMspaceSequentialRetrieval,JfrStringPool>,StopOnNullIterator<JfrDoublyLinkedList<JfrStringPoolBuffer>,StackObj> >(ConcurrentWriteOp<StringPoolWriteOp> & processor, JfrMemorySpace<JfrStringPoolBuffer,JfrMspaceSequentialRetrieval,JfrStringPool> * mspace, jfr_iter_direction direction) Line 325
        process_free_list<ConcurrentWriteOp<StringPoolWriteOp>,JfrMemorySpace<JfrStringPoolBuffer,JfrMspaceSequentialRetrieval,JfrStringPool> >(ConcurrentWriteOp<StringPoolWriteOp> & processor, JfrMemorySpace<JfrStringPoolBuffer,JfrMspaceSequentialRetrieval,JfrStringPool> * mspace, jfr_iter_direction direction) Line 344
        JfrStringPool::_write() Line 175
        ServiceFunctor<JfrStringPool,&JfrStringPool::_write>::process() Line 192
        WriteCheckpointEvent<ServiceFunctor<JfrStringPool,&JfrStringPool::_write> >::process() Line 168
        write_stringpool_checkpoint(JfrStringPool & string_pool, JfrChunkWriter & chunkwriter) Line 405
        JfrRecorderService::pre_safepoint_write() Line 431
        JfrRecorderService::_write() Line 385
        JfrRecorderService::finalize_current_chunk() Line 378
        JfrRecorderService::chunk_rotation() Line 372
        JfrRecorderService::rotate(int msgs) Line 331
        recorderthread_entry(JavaThread * thread, Thread * unused) Line 76
        JavaThread::thread_main_inner() Line 1870
        JavaThread::run() Line 1853
        Thread::call_run() Line 405
        thread_native_entry(Thread * thread) Line 470

              mgronlun Markus Grönlund
              rschmelter Ralf Schmelter
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

                Created:
                Updated:
                Resolved: