Loading...

XML

Word

Printable

Type: JEP
Resolution: Delivered
Priority: P3
Fix Version/s: 9
Component/s: None
Labels:

Author:
Igor Ignatyev
JEP Type:
Feature
Exposure:
Open
Scope:
Implementation
Discussion:
hotspot dash dev at openjdk dot java dot net, core dash libs dash dev at openjdk dot java dot net
Effort:
XS
Duration:
XS
JEP Number:
279

Summary

Automatically collect diagnostic information which can be used for further troubleshooting in case of test failures and timeouts.

Goals

Gather the following information to help diagnose test failures and timeouts:

For Java processes which are still running on a host after test failure or timeout:
- C and Java stacks
- Core dumps (minidumps on Windows)
- Heap statistics
Environment information:
- Running processes
- CPU and I/O loads
- Open files and sockets
- Free disk space and memory
- Most recent system messages and events

We will develop a library that provides this functionality and co-locate the library sources with the product code.

Motivation

It is difficult to troubleshoot intermittent test failures when there is no information about the testing environment. Such test failures often depend on test execution order and concurrence, which makes it extremely difficult to reproduce them.

Description

Currently, there are two extension points in the jtreg test harness. The first one is the timeout handler, which jtreg runs when a test times out. The second one is the observer, which implements the observer design pattern to track different events in a test run. We will use these extension points to gather diagnostic information and develop a custom observer and timeout handler for jtreg.

Information about environment and non-Java processes will be collected by running platform-specific commands. Gathering information about Java processes will be done via available diagnostic commands which are heavily extended by JEP 228, e.g., the print_vm_state command which collects information similar to hs_err files. The information gathered will be stored for later inspection together with test results. The observer will collect the information on finishedTest events when tests fail.

Since tests may create other processes, information about test processes and their child processes will be collected. To find such processes, the library will create a process tree with the original test process at the root.

Library sources will be placed in the test directory in the top-level repository, and makefiles will be updated to build them and bundle them as a part of test bundles.

Testing

We will schedule regular testing which uses this library. When the results and test execution become stable, we will extend the use of the library to other components.

Risks and Assumptions

Risk that execution of some commands can hang: To minimize this risk a command will be executed only for an allotted time and interrupted after that.
Running out of disk space on a host: The plan is to archive information, restrict the amount of saved information, and check free disk space before information collection.
Tools unavailable on a platform or host: If a tool is not available on a particular host or platform, the commands which depend on the missing tools will be skipped and a warning message will be added to the log file. Another possible solution is to download required tools from a known tools repository.
System resource exhaustion: Some failures can cause exhaustion of different types of system resources (CPU, memory, disk-space, etc.) or be caused by a lock of resources. Since it won't be possible to run commands to gather information in these situations, command execution will be skipped to prevent further system degradation.
Getting process trees in Java: Getting the process tree in Java requires the new process API described in JEP 102. Using the JDK under test as the stable JDK (i.e., the JDK which runs the jtreg test harness) may interfere with test results. To mitigate this, we will develop an alternative process-tree implementation. That implementation will simplify backporting this project into JDK 8.

is blocked by

JDK-8132961 Implement enhanced failures handler

Resolved

CODETOOLS-7901452 Make finding process id in timeout handler overridable

Closed

relates to

JDK-8151671 Enhance Test Failure Troubleshooting to support GUI tests

Open

JDK-8149465 Extend failure troubleshooting library to allow for stack trace comparison

Open

JDK-8043764 JEP 228: Add More Diagnostic Commands

Closed

JDK-8046092 JEP 102: Process API Updates

Closed

CODETOOLS-7901480 Provide more information when action is interrupted

Closed

(2 relates to)

There are no Sub-Tasks for this issue.

Assignee:: Ludvig Janiuk (Inactive)
Reporter:: Igor Ignatyev (Inactive)
Owner:: Igor Ignatyev (Inactive)
Reviewed By:: Aleksandre Iline, Brian Goetz
Endorsed By:: Mikael Vidstedt
Votes:: 0 Vote for this issue
Watchers:: 21 Start watching this issue

Due:: 2016-05-10
Created:: 2015-03-20 10:09
Updated:: 2024-05-20 07:54
Resolved:: 2017-04-09 21:23
Integration Due:: 2016-01-21

Details

Description

Summary

Goals

Motivation

Description

Testing

Risks and Assumptions

Attachments

Issue Links

Sub-Tasks

Activity

People

Dates