Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-6390007

cacao crashed on both clusters while geo cluster was starting on the nodes

    • Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Icon: P3 P3
    • None
    • 1.1
    • hotspot
    • None
    • generic
    • solaris_10

      Running:

        - s10 + patches
        - sc31u4 + 120500-04 & 120489-01
        - cacao 1.1 + 120675-01
        - odyssey R2 2/21/06 nightly

      Problem:

      I was staring geo cluster on 2 clusters that have a partnership defined between them and odyssey failed to start due to cacao going down. I have console msgs from each node below. The corresponding cacao logs are attached. Note that failover of odyssey infrastructure to the backup node succeded fine. Sometime later I switched the geo-infrastructure rg to the nodes where the failure occured and ody started up fine at that time.

      ***On phys-sabre-1 (1st cluster) -

      # geoadm start
      ... checking for management agent ...
      ... management agent check done ....
      ... starting product infrastructure ... please wait ...
      #
      [thread 144 also had an error]
      # An unexpected error has been detected by HotSpot Virtual Machine:
      #
      # SIGBUS (0xa) at pc=0xf03d8428, pid=27182, tid=45
      #
      # Java VM: Java HotSpot(TM) Server VM (1.5.0_06-b05 mixed mode)
      # Problematic frame:
      # C [libscrgadm.so.1+0x8428]
      #
      # An error report file with more information is saved as hs_err_pid27182.log
      #
      # If you would like to submit a bug report, please visit:
      # http://java.sun.com/webapps/bugreport/crash.jsp
      #
      Feb 23 15:41:06 phys-sabre-1 cacao[27180]: SUNWcacao launcher : cacao exited abnormaly

      Feb 23 15:41:06 phys-sabre-1 cacao[27180]: SUNWcacao launcher : no retries available, stop monitoring of cacao

      Feb 23, 2006 3:41:06 PM GenericConenctor RequestHandler-connectionException
      WARNING: java.io.EOFException
      java.io.EOFException
              at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2502)
              at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1267)
              at java.io.ObjectInputStream.readObject(ObjectInputStream.java:339)
              at com.sun.jmx.remote.socket.SocketConnection.readMessage(SocketConnection.java:211)
              at com.sun.jmx.remote.generic.ClientSynchroMessageConnectionImpl$MessageReader.run(ClientSynchroMessageConnectionImpl.java:391)
              at com.sun.jmx.remote.opt.util.ThreadService$ThreadServiceJob.run(ThreadService.java:208)
              at com.sun.jmx.remote.opt.util.JobExecutor.run(JobExecutor.java:59)
      Feb 23, 2006 3:41:06 PM ClientCommunicatorAdmin restart
      WARNING: Failed to restart: java.net.ConnectException: Connection refused
      Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
              at $Proxy0.startFailoverGroup(Unknown Source)
              at ServiceControl.main(ServiceControl.java:96)
      Caused by: javax.management.remote.generic.ConnectionClosedException: The connection has been closed by the server.
              at com.sun.jmx.remote.generic.ClientSynchroMessageConnectionImpl.close(ClientSynchroMessageConnectionImpl.java:338)
              at javax.management.remote.generic.GenericConnector.close(GenericConnector.java:276)
              at javax.management.remote.generic.GenericConnector.close(GenericConnector.java:231)
              at javax.management.remote.generic.ClientIntermediary$GenericClientCommunicatorAdmin.doStop(ClientIntermediary.java:839)
              at com.sun.jmx.remote.opt.internal.ClientCommunicatorAdmin.restart(ClientCommunicatorAdmin.java:133)
              at com.sun.jmx.remote.opt.internal.ClientCommunicatorAdmin.gotIOException(ClientCommunicatorAdmin.java:34)
              at javax.management.remote.generic.GenericConnector$RequestHandler.connectionException(GenericConnector.java:667)
              at com.sun.jmx.remote.generic.ClientSynchroMessageConnectionImpl$MessageReader.run(ClientSynchroMessageConnectionImpl.java:398)
              at com.sun.jmx.remote.opt.util.ThreadService$ThreadServiceJob.run(ThreadService.java:208)
              at com.sun.jmx.remote.opt.util.JobExecutor.run(JobExecutor.java:59)
      Feb 23, 2006 3:41:08 PM ClientIntermediary close
      INFO: java.io.IOException: The connection is not currently established.
      java.io.IOException: The connection is not currently established.
              at com.sun.jmx.remote.generic.ClientSynchroMessageConnectionImpl.checkState(ClientSynchroMessageConnectionImpl.java:567)
              at com.sun.jmx.remote.generic.ClientSynchroMessageConnectionImpl.sendOneWay(ClientSynchroMessageConnectionImpl.java:161)
              at javax.management.remote.generic.GenericConnector.close(GenericConnector.java:260)
              at javax.management.remote.generic.GenericConnector.close(GenericConnector.java:231)
              at javax.management.remote.generic.ClientIntermediary$GenericClientCommunicatorAdmin.doStop(ClientIntermediary.java:839)
              at com.sun.jmx.remote.opt.internal.ClientCommunicatorAdmin.restart(ClientCommunicatorAdmin.java:133)
              at com.sun.jmx.remote.opt.internal.ClientCommunicatorAdmin.gotIOException(ClientCommunicatorAdmin.java:34)
              at javax.management.remote.generic.GenericConnector$RequestHandler.connectionException(GenericConnector.java:667)
              at com.sun.jmx.remote.generic.ClientSynchroMessageConnectionImpl$MessageReader.run(ClientSynchroMessageConnectionImpl.java:398)
              at com.sun.jmx.remote.opt.util.ThreadService$ThreadServiceJob.run(ThreadService.java:208)
              at com.sun.jmx.remote.opt.util.JobExecutor.run(JobExecutor.java:59)
      Feb 23 15:41:08 phys-sabre-1 SC[SUNW.scmasa,geo-infrastructure,geo-failovercontrol,scmasa_svc_start]: Failed to start /usr/cluster/lib/rgm/rt/hamasa/cmas_service_ctrl_start geo-infrastructure.

      Feb 23 15:41:08 phys-sabre-1 Cluster.RGM.rgmd: Method <scmasa_svc_start> failed on resource <geo-failovercontrol> in resource group <geo-infrastructure> [exit code <1>, time used: 6% of timeout <600 seconds>]

      Feb 23, 2006 3:41:10 PM ServiceControl main
      WARNING: Unable to connect to the CACAO agent. The agent may be down or restarting
      Feb 23 15:41:19 phys-sabre-1 ip: TCP_IOC_ABORT_CONN: local = 010.006.173.091:0, remote = 000.000.000.000:0, start = -2, end = 6

      Feb 23 15:41:19 phys-sabre-1 ip: TCP_IOC_ABORT_CONN: aborted 0 connection

      Registering resource type <SUNW.HBmonitor>...done.
      Resource type <SUNW.scmasa> has been registered already
      Creating failover resource group <geo-clusterstate>...done.
      Creating failover resource group <geo-infrastructure>...done.
      Creating logical host resource <geo-clustername>...
      Logical host resource created successfully ....
      Creating resource <geo-hbmonitor> ...done.
      Creating resource <geo-failovercontrol> ...done.
      Bringing RG <geo-infrastructure> to managed state ...done.
      Enabling resource <geo-clustername> ...done.
      Enabling resource <geo-hbmonitor> ...done.
      Enabling resource <geo-failovercontrol> ...done.
      Node phys-sabre-1: Bringing resource group <geo-infrastructure> online ...scswitch: Resource group geo-infrastructure failed to start on chosen node and may fail over to other node(s)
      FAILED: scswitch -z -g geo-infrastructure -h phys-sabre-1

      #

      ***On phys-sabre-3 (2nd cluster) -

      # geoadm start
      ... checking for management agent ...
      ... management agent check done ....
      ... starting product infrastructure ... please wait ...
      [thread 45 also had an error]#
      # An unexpected error has been detected by HotSpot Virtual Machine:
      #
      # SIGBUS (0xa) at pc=0xf04a8298, pid=17957, tid=157
      #
      # Java VM: Java HotSpot(TM) Server VM (1.5.0_06-b05 mixed mode)
      # Problematic frame:
      # C [libscrgadm.so.1+0x8298]
      #

      # An error report file with more information is saved as hs_err_pid17957.log
      #
      # If you would like to submit a bug report, please visit:
      # http://java.sun.com/webapps/bugreport/crash.jsp
      #
      Feb 23 15:45:50 phys-sabre-3 cacao[17955]: SUNWcacao launcher : cacao exited abnormaly

      Feb 23 15:45:50 phys-sabre-3 cacao[17955]: SUNWcacao launcher : no retries available, stop monitoring of cacao

      Feb 23, 2006 3:45:50 PM GenericConenctor RequestHandler-connectionException
      WARNING: java.io.EOFException
      java.io.EOFException
              at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2502)
              at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1267)
              at java.io.ObjectInputStream.readObject(ObjectInputStream.java:339)
              at com.sun.jmx.remote.socket.SocketConnection.readMessage(SocketConnection.java:211)
              at com.sun.jmx.remote.generic.ClientSynchroMessageConnectionImpl$MessageReader.run(ClientSynchroMessageConnectionImpl.java:391)
              at com.sun.jmx.remote.opt.util.ThreadService$ThreadServiceJob.run(ThreadService.java:208)
              at com.sun.jmx.remote.opt.util.JobExecutor.run(JobExecutor.java:59)
      Feb 23, 2006 3:45:50 PM ClientCommunicatorAdmin restart
      WARNING: Failed to restart: java.net.ConnectException: Connection refused
      Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
              at $Proxy0.startFailoverGroup(Unknown Source)
              at ServiceControl.main(ServiceControl.java:96)
      Caused by: javax.management.remote.generic.ConnectionClosedException: The connection has been closed by the server.
              at com.sun.jmx.remote.generic.ClientSynchroMessageConnectionImpl.close(ClientSynchroMessageConnectionImpl.java:338)
              at javax.management.remote.generic.GenericConnector.close(GenericConnector.java:276)
              at javax.management.remote.generic.GenericConnector.close(GenericConnector.java:231)
              at javax.management.remote.generic.ClientIntermediary$GenericClientCommunicatorAdmin.doStop(ClientIntermediary.java:839)
              at com.sun.jmx.remote.opt.internal.ClientCommunicatorAdmin.restart(ClientCommunicatorAdmin.java:133)
              at com.sun.jmx.remote.opt.internal.ClientCommunicatorAdmin.gotIOException(ClientCommunicatorAdmin.java:34)
              at javax.management.remote.generic.GenericConnector$RequestHandler.connectionException(GenericConnector.java:667)
              at com.sun.jmx.remote.generic.ClientSynchroMessageConnectionImpl$MessageReader.run(ClientSynchroMessageConnectionImpl.java:398)
              at com.sun.jmx.remote.opt.util.ThreadService$ThreadServiceJob.run(ThreadService.java:208)
              at com.sun.jmx.remote.opt.util.JobExecutor.run(JobExecutor.java:59)
      Feb 23, 2006 3:45:52 PM ClientIntermediary close
      INFO: java.io.IOException: The connection is not currently established.
      java.io.IOException: The connection is not currently established.
              at com.sun.jmx.remote.generic.ClientSynchroMessageConnectionImpl.checkState(ClientSynchroMessageConnectionImpl.java:567)
              at com.sun.jmx.remote.generic.ClientSynchroMessageConnectionImpl.sendOneWay(ClientSynchroMessageConnectionImpl.java:161)
              at javax.management.remote.generic.GenericConnector.close(GenericConnector.java:260)
              at javax.management.remote.generic.GenericConnector.close(GenericConnector.java:231)
              at javax.management.remote.generic.ClientIntermediary$GenericClientCommunicatorAdmin.doStop(ClientIntermediary.java:839)
              at com.sun.jmx.remote.opt.internal.ClientCommunicatorAdmin.restart(ClientCommunicatorAdmin.java:133)
              at com.sun.jmx.remote.opt.internal.ClientCommunicatorAdmin.gotIOException(ClientCommunicatorAdmin.java:34)
              at javax.management.remote.generic.GenericConnector$RequestHandler.connectionException(GenericConnector.java:667)
              at com.sun.jmx.remote.generic.ClientSynchroMessageConnectionImpl$MessageReader.run(ClientSynchroMessageConnectionImpl.java:398)
              at com.sun.jmx.remote.opt.util.ThreadService$ThreadServiceJob.run(ThreadService.java:208)
              at com.sun.jmx.remote.opt.util.JobExecutor.run(JobExecutor.java:59)
      Feb 23 15:45:52 phys-sabre-3 SC[SUNW.scmasa,geo-infrastructure,geo-failovercontrol,scmasa_svc_start]: Failed to start /usr/cluster/lib/rgm/rt/hamasa/cmas_service_ctrl_start geo-infrastructure.

      Feb 23 15:45:52 phys-sabre-3 Cluster.RGM.rgmd: Method <scmasa_svc_start> failed on resource <geo-failovercontrol> in resource group <geo-infrastructure> [exit code <1>, time used: 15% of timeout <600 seconds>]

      Feb 23, 2006 3:45:54 PM ServiceControl main
      WARNING: Unable to connect to the CACAO agent. The agent may be down or restarting
      Feb 23 15:46:03 phys-sabre-3 ip: TCP_IOC_ABORT_CONN: local = 010.006.173.096:0, remote = 000.000.000.000:0, start = -2, end = 6

      Feb 23 15:46:03 phys-sabre-3 ip: TCP_IOC_ABORT_CONN: aborted 0 connection

      Registering resource type <SUNW.HBmonitor>...done.
      Resource type <SUNW.scmasa> has been registered already
      Creating failover resource group <geo-clusterstate>...done.
      Creating failover resource group <geo-infrastructure>...done.
      Creating logical host resource <geo-clustername>...
      Logical host resource created successfully ....
      Creating resource <geo-hbmonitor> ...done.
      Creating resource <geo-failovercontrol> ...done.
      Bringing RG <geo-infrastructure> to managed state ...done.
      Enabling resource <geo-clustername> ...done.
      Enabling resource <geo-hbmonitor> ...done.
      Enabling resource <geo-failovercontrol> ...done.
      Node phys-sabre-3: Bringing resource group <geo-infrastructure> online ...scswitch: Resource group geo-infrastructure failed to start on chosen node and may fail over to other node(s)
      FAILED: scswitch -z -g geo-infrastructure -h phys-sabre-3

      #

            collins Gary Collins (Inactive)
            manieto Maria Nieto
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: