Uploaded image for project: 'Code Tools'
  1. Code Tools
  2. CODETOOLS-7903754

jcstress should die asap - with report - on broken jvm/hw, instead of waiting on kill

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • P4
    • None
    • None
    • tools
    • None

    Description

      It may happen, that VM is simply borked, or HW got wrong.
      In that case, jcstress is slowwly rolling on with zero passes, eg:
      https://ci.adoptium.net/view/Test_grinder/job/Grinder/10399/console (will disapear in some time)
      ```
      18:18:22 (Time: overtime 03:36:27, -16577 tests in flight, 30 ms per test)
      18:18:22 (Sampling Rate: N/A)
      18:18:22 (JVMs: 1 starting, 0 running, 1 finishing)
      18:18:22 (CPUs: 4 configured, 4 allocated)
      18:18:22 (Results: 2562600 planned; 0 passed, 0 failed, 0 soft errs, 16579 hard errs)
      ```

      It run 4 hours, passed nothing and in case of failing jobs, the runtime is much more slow then in case of passin.

      In case of such all failing run, maybe it would be benefitial to generate report and exit with significant non zero return instead of keep rolling,m jsut to be killed on timeout or by angry admin?

      Note that I do nto have normal acess to that machine. Generally in that infra I have 50/50 to get such borked results.

      eg: https://ci.adoptium.net/view/Test_grinder/job/Grinder/10411/console is broken too, although differently:
      ```
      10:35:58 JVM args: [-XX:+UseBiasedLocking, -XX:+StressLCM, -XX:+StressGCM, -XX:+StressIGVN, -XX:+StressCCP, -XX:StressSeed=1914935936]
      10:35:58 Fork: #2
      10:35:58
      10:35:58 Messages:
      10:35:58 Unrecoverable error while running
      10:35:58 java.lang.IllegalStateException: Failed: 22
      10:35:58 at org.openjdk.jcstress.os.AffinitySupport$Linux.set(AffinitySupport.java:135)
      10:35:58 at org.openjdk.jcstress.os.AffinitySupport$Linux.bind(AffinitySupport.java:108)
      10:35:58 at org.openjdk.jcstress.os.AffinitySupport.bind(AffinitySupport.java:37)
      10:35:58 at org.openjdk.jcstress.tests.seqcst.sync.S1__S1_L1_S2__S2__S2_Test_jcstress$JcstressThread_actor3.jcstress_iteration_actor3(S1__S1_L1_S2__S2__S2_Test_jcstress.java:559)
      10:35:58 at org.openjdk.jcstress.tests.seqcst.sync.S1__S1_L1_S2__S2__S2_Test_jcstress$JcstressThread_actor3.internalRun(S1__S1_L1_S2__S2__S2_Test_jcstress.java:552)
      10:35:58 at org.openjdk.jcstress.infra.runners.CounterThread.run(CounterThread.java:44)
      10:35:58
      10:35:58 (Time: overtime 04:41:39, 1 tests in flight, 30 ms per test)
      10:35:58 (Sampling Rate: N/A)
      10:35:58 (JVMs: 0 starting, 1 running, 0 finishing)
      10:35:58 (CPUs: 4 configured, 8 allocated)
      10:35:58 (Results: 3030696 planned; 0 passed, 0 failed, 0 soft errs, 7802 hard errs)
      10:35:58
      10:36:01
      10:36:01 ....... [ERROR] o.o.j.t.seqcst.sync.S1__S1__S1_S1_S2__S2_Test
      ```

      This one passed:
      https://ci.adoptium.net/view/Test_grinder/job/Grinder/10410/console
      ```
      04:45:15 (Time: overtime 08:58:56, 2 tests in flight, 30 ms per test)
      04:45:15 (Sampling Rate: 97.30 K/sec)
      04:45:15 (JVMs: 0 starting, 1 running, 0 finishing)
      04:45:15 (CPUs: 4 configured, 4 allocated)
      04:45:15 (Results: 491088 planned; 40541 passed, 0 failed, 0 soft errs, 0 hard errs)
      ```

      But was killed by time-out on 10 hours. (but that different story)

      Attachments

        Activity

          People

            jvanek Jiří Vaněk
            jvanek Jiří Vaněk
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: