Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8217413

[Testbug] runtime/ErrorHandling/TimeoutInErrorHandlingTest.java doesn't handle longer than expected timeouts

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: P4 P4
    • None
    • 13
    • hotspot
    • sparc
    • solaris

      The test describes itself as follows:

       /* Start the VM and let it crash. Specify TestUnresponsiveErrorHandler which will
               * let five subsequent error reporting steps hang. The Timeout handling triggered
               * by the WatcherThread should kick in and interrupt those steps. In theory, the
               * text "timeout occurred during error reporting in step .." (the little timeouts)
               * should occur in the error log up to four times, followed by the final big timeout
               * "------ Timeout during error reporting after xx s. ------"
               *
               * Note that there are a number of uncertainties which make writing a 100% foolproof
               * test challenging. The time the error reporting thread takes to react to the
               * timeout triggers is unknown. So it is difficult to predict how many little timeouts
               * will be visible before the big timeout kicks in. Also, once the big timeout hits,
               * error reporting thread and Watcherthread will race. The former writes his last
               * message to the error logs and flushes, the latter waits 200ms and then exits the
               * process without further synchronization with the error reporting thread.
               *
               * Because of all this and the desire to write a bullet proof test which does
               * not fail sporadically, we will not test for the final timeout message nor for all
               * of the optimally expected little timeout messages. We just test for two of the
               * little timeout messages to see that repeated timeout handling is basically working.
               */

       "-XX:ErrorLogTimeout=16", // 16 seconds big timeout = 4 seconds per little timeout

      However in the failing case we see a "little timeout" of 8 seconds, which means we may hit the big timeout before seeing a second little timeout:

      ----------System.out:(3/1157)----------
      Command line: [/opt/mach5/mesos/work_dir/jib-master/install/jdk13-jdk.184/solaris-sparcv9-debug.jdk/jdk-13/fastdebug/bin/java -cp /opt/mach5/mesos/work_dir/44f96277-5e42-4ded-8c25-00977506cd0e/testOutput/test-support/jtreg_open_test_hotspot_jtreg_hotspot_tier2_runtime/classes/6/runtime/ErrorHandling/TimeoutInErrorHandlingTest.d:/opt/mach5/mesos/work_dir/jib-master/install/jdk13-jdk.184/src.full/open/test/hotspot/jtreg/runtime/ErrorHandling:/opt/mach5/mesos/work_dir/44f96277-5e42-4ded-8c25-00977506cd0e/testOutput/test-support/jtreg_open_test_hotspot_jtreg_hotspot_tier2_runtime/classes/6/test/lib:/opt/mach5/mesos/work_dir/jib-master/install/java/re/jtreg/4.2/promoted/all/b13/bundles/jtreg_bin-4.2.zip/jtreg/lib/javatest.jar:/opt/mach5/mesos/work_dir/jib-master/install/java/re/jtreg/4.2/promoted/all/b13/bundles/jtreg_bin-4.2.zip/jtreg/lib/jtreg.jar -XX:+UnlockDiagnosticVMOptions -Xmx100M -XX:ErrorHandlerTest=14 -XX:+TestUnresponsiveErrorHandler -XX:ErrorLogTimeout=16 -XX:-CreateCoredumpOnCrash -version ]
      Found hs_err file. Scanning...
      Found: [timeout occurred during error reporting in step "test unresponsive error reporting step"] after 8 s..
      ----------System.err:(12/764)----------
      java.lang.RuntimeException: hs-err file incomplete (first missing pattern: 1)
      at TimeoutInErrorHandlingTest.main(TimeoutInErrorHandlingTest.java:127)

      Or we need to understand why we get such a long timeout in the first place - but our Solaris sparcv9 debug testing is notorious for inducing timeouts!

            stuefe Thomas Stuefe
            dholmes David Holmes
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: