Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8191093

Improve behavior when safepoint begin times out

XMLWordPrintable

      We recently had an instance of the PageArmed == 0 guarantee failing (see JDK-8155700 and JDK-8038480 for other cases of this). In our case we are pretty sure that the underlying cause was a flaky host that caused a thread (or threads) to somehow get stuck and never ack the safepoint. However, it would have been nice if the JVM handled the situation a bit better.

      What I'd like to improve is:

      First, eliminate the unintentional attempt inside the loop to arm the polling page when the {{iterations}} variable overflows.

      Second, introduce a time-based heuristic to force the JVM to abort when it's stuck in the loop for way too long. I think introducing a new cmd line arg to specify how long to wait before aborting (and setting it to something conservative, like 30 mins or an hour) is probably the best way. We could re-use the SafepointTimeout / DieOnSafepointTimeout args for this. However, I think it's nice to get a warning early (the current 10sec default for SafepointTimeoutDelay is reasonable IMHO) and aborting much later.

      Thoughts?

            rehn Robbin Ehn
            tonyp Tony Printezis
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: