Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-4073824

green threads, garbage collector dies with SIGSEGV on Sol 2.5.1, JDK 1.1.3

XMLWordPrintable

    • sparc
    • solaris_2.5.1

      Notes from customer:

      **********************
      **********************

      Greetings:

      We have discovered a significant flaw within the Java Virtual Machine
      (VM), and request your assistance. The particulars:

          * Sun JDK release 1.1.3.
          * Solaris 2.5.1, UltraSparc Creator.
          * 128 MB RAM, 600MB swap.
          * VM arguments: -mx350m -ms25m -verbosegc

      Our project is a mid- to large-size application (just over 200
      classes) that models a network of our central office ATM switches. The
      software is 100% Java, and is best classified as a "long-running,
      system-software" type application. Within the simulator, it is not
      uncommon to have many thousands of extant objects, representing
      network elements, interface cards, connections, etc.

      The simulator is in beta release, and we have begun some "stress
      testing" that will mimic a reasonably-loaded ATM switch. Yesterday, we
      were loading a "switch" with approximately 5000 connections when the
      VM failed with a SIGSEGV. The loading process was 75% completed. We
      noticed that, at the point of failure, the garbage collector had just
      begun executing. We initiated a second run of the simulator; this time
      we got slightly further before the VM failed, same cause. We examined
      the core file with dbx; most of the stack had been hosed, but we did
      get a partial trace. What follows is the data provided by dbx, plus a
      snapshot of the VM garbage collector diagnostics prior to and at the
      point of VM failure:

      ----------- cut here --------------------

      ********** dbx output ********************

      Mr. Peabody: dbx /usr/local/java/JDK/bin/sparc/green_threads/java core.08211706
      Reading symbolic information for /usr/local/java/JDK/bin/sparc/green_threads/java
      core file header read successfully
      Reading symbolic information for rtld /usr/lib/ld.so.1
      Reading symbolic information for /usr/local/java/JDK/lib/sparc/green_threads/libjava.so
      Reading symbolic information for /opt/SUNWspro/lib/libm.so.1
      Reading symbolic information for /usr/lib/libdl.so.1
      Reading symbolic information for /usr/openwin/lib/libX11.so.4
      Reading symbolic information for /usr/lib/libsocket.so.1
      Reading symbolic information for /usr/lib/libnsl.so.1
      Reading symbolic information for /usr/lib/libc.so.1
      Reading symbolic information for /usr/openwin/lib/libXext.so.0
      Reading symbolic information for /usr/lib/libw.so.1
      Reading symbolic information for /usr/lib/libintl.so.1
      Reading symbolic information for /usr/lib/libmp.so.1
      Reading symbolic information for /usr/platform/SUNW,Ultra-1/lib/libc_psr.so.1
      Reading symbolic information for /usr/local/java/JDK/lib/sparc/green_threads/libnet.so
      Reading symbolic information for /usr/lib/nss_nis.so.1
      program terminated by signal SEGV (Segmentation Fault)
      dbx: core file read error: address 0xd93a1ddc not in data space
      dbx: core file read error: address 0xd93a1ddc not in data space
      dbx: attempt to read stack failed - bad frame pointer
      (dbx 1) where
      =>[1] markChildren(0xdcfbb998, 0xdd9fb780, 0x4898, 0xd8800000, 0x3, 0x1e), at 0xef6dbed8
      dbx: core file read error: address 0xd93a1ddc not in data space
      dbx: attempt to read stack failed - bad frame pointer
      (dbx 2)

      ************* last successful GC invocation **************

      <GC: managing allocation failure. need 840 bytes, type=1, action=1>
      <GC: freed 207271 objects, 7968160 bytes in 3130 ms, 24% free (11083904/45481976
      )>
        <GC: init&scan: 6 ms, scan handles: 2889 ms, sweep: 235 ms, compact: 0 ms>
      <GC: managing allocation failure. need 840 bytes, type=1, action=2>
      <GC: managing allocation failure. need 840 bytes, type=1, action=3>
      <GC: managing allocation failure. need 840 bytes, type=1, action=4>
      <GC: expanded object space by 385024 to 45867000 bytes, 25% free>

      ************ point of VM failure *******************

      <GC: managing allocation failure. need 840 bytes, type=1, action=1>
      Segmentation Fault (core dumped)

      -------------- cut here -------------------

      This is all the concrete data we have, other than simulator logs and
      the core file. We are suspicious, however, that something is amiss
      within the garbage collector. We base this supposition on data gathered
      during close monitoring of the GC output during the two simulator
      runs.

      It *appears* that, at some point, the GC becomes misinformed
      about the amount of memory actually available and begins unnecessarily
      requesting new chunks. Refering to the "last successful GC invocation"
      above, you'll notice the GC was searching for 840 bytes and couldn't
      find them. Consequently, he began his collection process and freed
      7968160 bytes. Now, looking at the line that indicates free memory
      (24% free), it seems to us that, prior to the GC's sweep he had
      11083904 - 7968160 = 3115744 bytes available. Could the GC heap be so
      fragmented that it wasn't possible to allocate an 840 byte chunk?

      An important point here. During both simulator runs, the OS had
      *plenty* of virtual memory available; a shortage of swap space is not
      an issue here.

      Please note that this analysis is *very* rough work; we are unfamiliar
      (other than surface details) with the GC's actual algorithm and may be
      drawing incorrect conclusions.

      We suspect it will be difficult to provide you with an isolated code
      fragment that manifests the VM failure. Small test applications that
      simply create a slew of objects run with no difficulty. However, we're
      willing to provide you with as much data as required to solve this
      problem (assuming non-disclosure).

      A final note. As previously mentioned, our product is 100% Java; there
      is *no* native code involved, other than what exists within and is
      required by the VM and JDK.

      Regards,

      --

      -------------------------------------------------------------
      David E. Young
      Fujitsu Network Communications "I claim not to have controlled
      (###@###.###) events, but confess plainly
                                       that events have controlled me."
                                        -- Abraham Lincoln (1864)
      "Apology is Policy"

            never Tom Rodriguez
            duke J. Duke
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: