Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-4435671

VolanoMark intermiten failure on x86 with Merlin b58 -server flag

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: P1 P1
    • 1.4.0
    • 1.4.0
    • hotspot
    • beta2
    • x86
    • generic
    • Verified


      With Merlin b58 -server flag, on machine jtgbp62c (Solaris 7 x86),
      VMark passed 24 hours but there were intermitten failures.
       core and log files are under /net/jtgb4u4c.eng/export/sail14/bigapps_log/solaris/merlin_b58/jtgbp62c/VolanoMark*

      Error message:
      An unexpected exception has been detected in native code outside the VM.
      Unexpected Signal : 11 occurred at PC=0xC50
      Function=[Unknown.]
      Library=(N/A)

      NOTE: We are unable to locate the function name symbol for the error
            just occurred. Please refer to release documentation for possible
            reason and solutions.
      Dynamic libraries:
      0x8048000 /usr/j2se_b58/bin/../bin/i386/native_threads/java
      0xdfbb7000 /usr/lib/libthread.so.1
      0xdfbdf000 /usr/lib/libdl.so.1
      0xdfb22000 /usr/lib/libc.so.1
      0xdecb3000 /usr/j2se_b58/jre/lib/i386/server/libjvm.so
      0xdfb05000 /usr/lib/libCrun.so.1
      0xdfafa000 /usr/lib/libsocket.so.1
      0xdfa7d000 /usr/lib/libnsl.so.1
      0xdfa70000 /usr/lib/libm.so.1
      0xdfa6b000 /usr/lib/libw.so.1
      0xdfa66000 /usr/lib/libmp.so.2
      0xdfa55000 /usr/j2se_b58/jre/lib/i386/native_threads/libhpi.so
      0xdfa3d000 /usr/j2se_b58/jre/lib/i386/libverify.so
      0xdfa19000 /usr/j2se_b58/jre/lib/i386/libjava.so
      0xdfa06000 /usr/j2se_b58/jre/lib/i386/libzip.so
      0xdd449000 /usr/j2se_b58/jre/lib/i386/libnet.so
      0xdebe7000 /usr/lib/nss_nis.so.1
      0xdf80a000 /usr/lib/straddr.so

      Local Time = Fri Apr 6 06:28:16 2001
      Elapsed Time = 17
      #
      # The exception above was detected in native code outside the VM
      #
      # Java VM: Java HotSpot(TM) Server VM (1.4.0-beta-b58 mixed mode)
      #


      Coleen Phillimore looked at the problem. Here is her diagnosis.

      Post the hs_error_pid*log in the bug report. The symptom of this is that there's a seg fault in C2 compiler.
       
        [6] sigacthandler(0xb, 0xdc000000, 0xbddc0000, 0x1fbddc00), at 0xdfbcd43b
        ---- called from signal handler with signal 11 (SIGSEGV) ------
        [7] emit_opcode(0x814a960, 0x8b0814a9, 0x8b0814, 0x8b08), at 0xded3f46c
        [8] encode_Copy(0x814a960, 0x10814a9, 0x10814, 0x108), at 0xded6bc51
        [9] movPNode::emit(0x83959f0, 0x60083959, 0xa9600839, 0x14a96008), at 0xded729
      07
        [10] Compile::Fill_buffer(0xcf1fc558, 0x2ccf1fc5, 0x852ccf1f, 0x14852ccf), at
      0xdedd5751
        [11] Compile::Output(0xcf1fc558, 0xe8cf1fc5, 0xc2e8cf1f, 0x1fc2e8cf), at 0xded
      d408d

      Running this with -Xcomp reveals even more problems.

      june.zhong@eng 2001-04-09

          [JDK-4435671] VolanoMark intermiten failure on x86 with Merlin b58 -server flag

          BT2:CONVERTED DATA

          BugTraq+ Release Management Values

          COMMIT TO FIX:
          merlin-beta2

          FIXED IN:
          merlin-beta2

          INTEGRATED IN:
          merlin-beta2

          VERIFIED IN:
          merlin-beta2

          Defect Conversion BT2 (Inactive) added a comment - BT2:CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: merlin-beta2 FIXED IN: merlin-beta2 INTEGRATED IN: merlin-beta2 VERIFIED IN: merlin-beta2

          J. Duke added a comment -
          BT2:EVALUATION

          I am not able to reproduce the particular failure in emit_opcode()
          reported, but I am able to reproduce intermittent falures when
          running with -Xcomp. The failures are usually occur during GC.
          The SEGV occurs because an oop header is corrupted. This is almost
          certainly due to a GC missing an oop. The failures have become
          much less frequent since the fix for 4431764 was put back, so this
          bug appears to be the cause of some of the failures. I suspect
          there are either more instances of the VM holding a raw oop during
          an allocation as described in 4431764 , or there is a bad oopmap
          being generated by C2.

          steve.dever@Eng 2001-05-21


          The problem was that in the inline allocation code used for TLS allocation.
          The "eden_top" in the sequence would occasionally get moved past a safepoint.
          The situation can with an allocation inside loops nested two or more
          deep. The following program demonstrates the problem:

          public class alloc {
            public static void main( String args[] ) {
              foo(12000,true);
              foo(12000,true);
              foo(12000,true);
            }
            static int foo( int N, boolean P ) {
              int sum = 0;
              for( int i = 0; P && i < N; i++ ) {
                for( int j = 0; j < 2; j++ )
                  sum += i;
                new alloc();
              }
              return sum;
            }
          }

          The bug can be observed by examining the code generated for alloc.foo().
          An actual failure caused by this bug is very rare because it will only
          happen if a GC occurs in the small window between the load of the eden_top
          and the safepoint.
            
          The fix adds a control input to the load nodes for the eden_top
          and eden_end to prevent them from being moved across the safepoint.
          This is a work-around for the problem.
            
          The "correct" way to fix this would be to have a SafePointNode
          update the raw memory state. This would create a graph which
          expresses the exact semantics we want. However, the loop
          optimization is not currently set up to handle this. Loop
          optimizations would be seriously degraded. Fixing the loop
          optimiztions would be a significant change which is too risky to
          try to get into beta-refresh.
            
          We plan to investigate implementing the correct fix in the next
          release.


          steve.dever@Eng 2001-07-09

          J. Duke added a comment - BT2:EVALUATION I am not able to reproduce the particular failure in emit_opcode() reported, but I am able to reproduce intermittent falures when running with -Xcomp. The failures are usually occur during GC. The SEGV occurs because an oop header is corrupted. This is almost certainly due to a GC missing an oop. The failures have become much less frequent since the fix for 4431764 was put back, so this bug appears to be the cause of some of the failures. I suspect there are either more instances of the VM holding a raw oop during an allocation as described in 4431764 , or there is a bad oopmap being generated by C2. steve.dever@Eng 2001-05-21 The problem was that in the inline allocation code used for TLS allocation. The "eden_top" in the sequence would occasionally get moved past a safepoint. The situation can with an allocation inside loops nested two or more deep. The following program demonstrates the problem: public class alloc {   public static void main( String args[] ) {     foo(12000,true);     foo(12000,true);     foo(12000,true);   }   static int foo( int N, boolean P ) {     int sum = 0;     for( int i = 0; P && i < N; i++ ) {       for( int j = 0; j < 2; j++ )         sum += i;       new alloc();     }     return sum;   } } The bug can be observed by examining the code generated for alloc.foo(). An actual failure caused by this bug is very rare because it will only happen if a GC occurs in the small window between the load of the eden_top and the safepoint.    The fix adds a control input to the load nodes for the eden_top and eden_end to prevent them from being moved across the safepoint. This is a work-around for the problem.    The "correct" way to fix this would be to have a SafePointNode update the raw memory state. This would create a graph which expresses the exact semantics we want. However, the loop optimization is not currently set up to handle this. Loop optimizations would be seriously degraded. Fixing the loop optimiztions would be a significant change which is too risky to try to get into beta-refresh.    We plan to investigate implementing the correct fix in the next release. steve.dever@Eng 2001-07-09

            sdeversunw Steve Dever (Inactive)
            jzhongsunw June Zhong (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: