Loading...

XML

Word

Printable

Type: Bug
Resolution: Fixed
Priority: P2
Fix Version/s: 1.0.1_05
Affects Version/s: 5.1
Component/s: core-svc
Labels:
- jdmk_51_patch_03

Subcomponent:
javax.management
Resolved In Build:
team
CPU:

generic, sparc
OS:

solaris_8, solaris_9, solaris_10

Issue	Fix Version	Assignee	Priority	Status	Resolution	Resolved In Build
JDK-2146902	jdmk5.1_03	Laurence Caullet	P2	Closed	Fixed	03

Here is the info from the client:

About a year ago, we had a case about missing connection lost notification when the logical host for the destination address of the connection failed over from one node of the remote cluster to another node. Although the connection lost had been detected, but no connection lost notification was generated. You had later provided a module which would help locating the cause or even fix the problem. However, we were not able to reproduce it so it remained a mystery.

We ran into the same problem again. The clusters are sabre1(phys-sabre-1, phys-sabre-2) and sabre2 (phys-sabre-3, phys-sabre-4). Here are some highlights -

On cluster sabre1, logical host "sabre1" failed over from phys-sabre-1 to phys-sabre-2 -
Nov 14 09:45:52 phys-sabre-2 Cluster.RGM.rgmd: [ID 922363 daemon.notice] resource geo-clustername status msg on node phys-sabre-1 change to <LogicalHostname offline.>
Nov 14 09:45:53 phys-sabre-2 Cluster.RGM.rgmd: [ID 922363 daemon.notice] resource geo-clustername status msg on node phys-sabre-2 change to <LogicalHostname online.>

The connection lost was detected by cluster sabre2 (the time difference between the two clusters is about 11-12 minutes) -
Nov 14 09:57:36 phys-sabre-3 cacao[26093]: [ID 702911 daemon.warning] ClientCommunicatorAdmin.Checker-run : Failed to check the connection: java.io.InterruptedIOException: Waiting response timeout: 30000

Is there any way to check whether a JMXConnectionNotification was generated? Odyssey had subscribed the notification, but did not seem to get any.

Before the connection lost was detected, phys-sabre-3 appeared to be having failure with the network adapter hosting sabre2, the logical host used to connect to cluster sabre1.

Nov 14 09:57:16 phys-sabre-3 in.mpathd[80]: [ID 215189 daemon.error] The link has gone down on eri0
Nov 14 09:57:16 phys-sabre-3 in.routed[353]: [ID 238047 daemon.warning] interface eri0 to 10.6.173.85 turned off
Nov 14 09:57:16 phys-sabre-3 eri: [ID 786680 kern.notice] SUNW,eri0 : No response from Ethernet network : Link down -- cable problem?
Nov 14 09:57:18 phys-sabre-3 in.mpathd[80]: [ID 820239 daemon.error] The link has come up on eri0
Nov 14 09:57:18 phys-sabre-3 eri: [ID 786680 kern.notice] SUNW,eri0 : 100 Mbps full duplex link up
Nov 14 09:57:18 phys-sabre-3 in.routed[353]: [ID 300549 daemon.warning] interface eri0 to 10.6.173.85 restored

Nov 14 09:57:32 phys-sabre-3 in.mpathd[80]: [ID 299542 daemon.error] NIC repair detected on eri0 of group sc_ipmp0

This happened repeatedly.

Also cluster sabre2 periodically made connections to cluster sabre1. It returned successfully despite the logical host failover on cluster sabre1 and the unstable local network adapter.

Would you be able to help us? The clusters are still in this state. Root password is "fu_bar".

Let me know if you need more information.

backported by

JDK-2146902 JMXMP connector, failed to send a connectionLost notification

Closed

Assignee:: Shanliang Jiang (Inactive)

Reporter:: Shanliang Jiang (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2006-11-21 05:25

Updated:: 2012-10-10 09:55

Resolved:: 2006-11-23 01:00

Imported:: 17/Sep/12 12:04 PM

Indexed:: 20/Jul/12 3:27 AM

Details

Backports

Description

Attachments

Issue Links

Activity

People

Dates