Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-6813266

hs204t001 fails intermittently

XMLWordPrintable

    • generic
    • generic

      Here is a link to a failure after v1.6r22 was used:

      http://sqeweb.sfbay.sun.com/nfs/tools/gtee/results/JDK7/NIGHTLY/VM/2008-09-15/Main_Baseline/index.html

      Here is a snippet of the log file:

      [2008-09-16T16:15:05.53] File red completely
      [2008-09-16T16:15:05.53] hs204t001R
      [2008-09-16T16:15:05.53] started running thread..
      [2008-09-16T16:15:05.53] ... Inside doThisThread..
      [2008-09-16T16:15:05.53] State..1108280
      [2008-09-16T16:15:05.53] comming out ..
      [2008-09-16T16:15:05.53] index =1108290
      [2008-09-16T16:15:05.53] # Test level exit status: 96

      It looks like sometimes the worker thread prints the "comming out"
      line and sometimes it doesn't. Not sure why and not sure if that
      matters to the failure mode.
      Here is a link to the oldest nightly failure that I found:

      http://sqeweb.sfbay.sun.com/nfs/tools/gtee/results/JDK7/NIGHTLY/VM/2008-08-29/Main_Baseline/index.html

      Here is a snippet of the logfile:

      [2008-08-30T16:11:50.05] MyClass :: Successfully redefined..
      [2008-08-30T16:11:50.05] hs204t001R
      [2008-08-30T16:11:50.05] started running thread..
      [2008-08-30T16:11:50.05] ... In side doThisThread..
      [2008-08-30T16:11:50.05] State..149400
      [2008-08-30T16:11:50.05] comming out ..
      [2008-08-30T16:11:50.05] index =149410
      [2008-08-30T16:11:50.05] ... Failed ..
      [2008-08-30T16:11:50.05] # Test level exit status: 1

      This failure mode is the same as the 2009.03.03 failure, but
      the messages are a little bit different. The testbase for
      this failure claims to be v1.6r21. The fix for 6354035 was
      integrated in v1.6r22.
      Here is an interesting entry from my nightly analysis report:

      New nsk.jvmti failures (from 2009.03.03)
         nsk/jvmti/scenarios/hotswap/HS204/hs204t001
              This test failed due to an exit code of 96 on Linux IA32 Client
              VM -Xmixed (machine wowamd). The exit code is set as follows:

                  System.exit(Consts.JCK_STATUS_BASE + run(args, System.out));

              The run() method is just a wrapper around the runIt() method:

                  return new hs204t001().runIt(args, out);

              and runIt() returns as follows:

                  return (t1.getIndex() == afterSuspend ? 0 : 1);

              The interesting log output looks like:

                  State..9130
                  index =53640
                  # Test level exit status: 96

              The index value shown in the "State.." output line is acquired
              after the worker thread is up and running and before it is
              suspended. This index value can vary from run to run. Another
              index value is saved after the worker thread is suspended and
              before popFrame() is called. This index value can also vary
              from run to run, but should remain the same as long as the
              worker thread is suspended. The test also sets the atomic "run"
              variable to false at this time and that should prevent the
              index value from being incremented. The index value in the
              "index =" output line is acquired after popFrame() and
              t1.join() are called. The test's popFrame() function also
              resumes the worker thread. The test expects the current return
              value from t1.getIndex() to match the index value saved after
              the worker thread was suspended.

              Generally speaking, when there is an "expect != actual" value
              mismatch, the test should output both the expected value and
              the actual value. We have the actual value from the "index ="
              output line, but we don't have the expected value.

              I see a possible problem in this test. The readme file
              indicates that the suspendThread()/popFrame() logic pair
              assumes that the popFrame() will result in the worker thread's
              execution state being reset into the worker thread's run()
              method where it will call into doInThisThread() again. While I
              don't see any test logic to positively verify this assumption,
              this assertion may be faulty because it assumes that
              suspendThread() will catch the worker thread in
              doInThreadThread() when it could be in hs204t001R.run.get() or
              even hs204t001R.suspend.set() if it gets really, really lucky.
              If the worker thread is suspended in one of those methods, then
              the popFrame() will result in the worker thread's execution
              state being reset into the doInThreadThread() method and not
              into the worker thread's run() method. Depending on how
              hs204t001R.run.get() is implemented, the suspendThread() could
              catch the worker thread in a subroutine call down in get() and
              the popFrame() will return execution somewhere in get().

              If the thread was suspended in a subroutine call in get() after
              the current value of the run field was fetched, then after the
              thread is resumed, the now stale value of the run field is
              returned and we'll get one last increment of the index field
              before we loop around and see the real value of run.get().
              This would explain a failure where expect != actual and the
              difference is one increment value. Since we don't have the
              expect value in the log, we don't know the difference between
              expect and actual.

              Update: run.get() is a simple getter and doesn't call any
                  other subroutines so that sinks my one theory about why
                  expect != actual. My search of nightly results has found 30
                  instances of this test failing. Most of the failures are in
                  GC baselines, some are in Main and this one is in
                  RT_Baseline. I don't know if they all fail the same way.

      Here is a link to the run's analysis.html file:

      http://sqeweb.sfbay/nfs/results/vm/gtee/JDK7/NIGHTLY/VM/2009-03-03/RT_Baseline/vm/linux-i586/client/mixed/linux-i586_client_mixed_nsk.jvmti.testlist/analysis.html

      Here is a snippet of the failure log:

      [2009-03-04T06:45:44.20] File red completely
      [2009-03-04T06:45:44.20] hs204t001R
      [2009-03-04T06:45:44.20] started running thread..
      [2009-03-04T06:45:44.20] ... Inside doThisThread..
      [2009-03-04T06:45:44.20] State..9130
      [2009-03-04T06:45:44.20] index =53640
      [2009-03-04T06:45:44.20] # Test level exit status: 96

            Unassigned Unassigned
            dcubed Daniel Daugherty
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: