Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8346880

[aix] java/lang/ProcessHandle/InfoTest.java still fails: "reported cputime less than expected"

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: P4 P4
    • 25
    • 21, 23, 24, 25
    • core-libs
    • None
    • b06
    • aix

      The test java/lang/ProcessHandle/InfoTest.java still fails sporadically on AIX. The test exclusion was removed through JDK-8211847 under the assumption the problem was gone. But it turned out that it was wrong.

      We can see an exception like:

      java.lang.AssertionError: reported cputime less than expected: PT0.2S, actual: Optional[PT0.021179882S]
      at org.testng.Assert.fail(Assert.java:99)
      at InfoTest.test1(InfoTest.java:110)

      After a discussion with Roger Riggs and the team, we came to the following conclusion.
      The problem is based on 2 independent causes; one fundamental and one AIX-specific.

      The fundamental cause is as follows:
      Modern hardware provides many hardware threads (up to several hundred) that enable the worker threads of the processes to be processed in real parallel. To ensure that such a worker thread does not take up a hardware thread resource for itself, it is rolled out by the OS after a few ms at the latest to make room for another worker thread, possibly from another process.
      The OS continuously adds up all the times that each worker thread of a process is active as process cpu time.

      It is easy to see that there is no correlation between the CPU time of a process and the real time(wall time).

      If you have a system with many hardware threads and few worker threads, these are active almost all the time. If they are rolled out, they are immediately rolled back in due to a lack of competition. If a process has several worker threads, the CPU time will increase faster than the real time. In this case, cpu time > real time is to be expected, which is what the test wants.

      However, if the same system is heavily loaded, i.e. there are a lot of worker threads competing on one hardware thread, each individual worker thread can only become active relatively rarely. Even if a process has several worker threads, the total CPU time will be less than the past real time. This is even more pronounced if the individual worker threads have to wait for each other via synchronization objects. Since this is the normal case, cpu time < real time usually applies.

      Therefore, such a test makes little sense in principle.

      The AIX-specific cause of the problem lies in the API used to get the cpu time. The /proc/<pid>/psinfo file is evaluated to obtain the cpu time. The /proc directory is only present on AIX for portability reasons. The data in it is only updated at long intervals. For example, the cpu time is only updated every 1-2 seconds, which can cause the error.
      The better solution here would be the getprocs64() API. Here the values ​​for the cpu time are updated by the OS kernel every few ms.

      It may therefore be that the error no longer occurs after adjusting the AIX coding, but in principle the problem is not solved.

            jkern Joachim Kern
            clanger Christoph Langer
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: