TimedWaitingTest frequently fails in GitHub Actions:
```
java.lang.IllegalStateException: Thread.sleep was too short: 999 ms
```
It detects that a waiting call lasted for 999 ms instead of requested 1000 ms which is not in line with the specification. The failing methods differ (Thread.sleep, Thread.join, Thread.wait, ReentrantLock.tryLock...) but the actual waiting time is always 999 ms.
TimedWaitingTest uses non-monotonic `System.currentTimeMillis()` to measure the time which in general is not correct and may cause the failures because of negative time jumps. That is why as the first attemptJDK-8351402 was created that suggested to replace `System.currentTimeMillis()` with monotonic `System.nanoTime()`.
That would fix the failing test, however:
1. CRaC modifies `System.nanoTime()` to ensure it's monotonicity so the initial intention was to rely on some other time provider to test CRaC.
2. After further investigation it was revealed that the test fails even when `System.currentTimeMillis()` remains monotonic.
This is what really happens.
Some background:
- CRaC uses os::javaTimeMillis() to offset os::javaTimeNanos(), os::javaTimeMillis() reads nanosecond-precision time and then truncates it towards 0 to millisecond precision.
- On checkpoint CRaC reads the time: checkpoint_millis = os::javaTimeMillis().
- On restore it calculates the difference between the current time and the checkpoint time: diff_millis = os::javaTimeMillis() - checkpoint_millis.
- Then diff_millis * 1 000 000 is used to offset os::javaTimeNanos() to make it seem like roughly the same amount of monotonic time passed as the wall clock time.
Now suppose the following:
- Checkpoint time reading is executed at t = 1 999 999 ns, which is converted to 1 ms, so checkpoint_millis is 1.
- Restore is executed at t = 2 000 000 ns, which is converted to 2 ms, so diff_millis = 2 - 1 = 1.
This results in os::javaTimeNanos() being offset by 1 000 000 ns instead of just 1 ns which actually passed. This makes the “nano time” to be almost 1 ms ahead of the real time which makes the test fail. Such big difference is an extreme case which hasn't been witnessed in reality but the real difference of around 0.2–0.5 ms is enough to make the test fail.
Such behavior is not incorrect per-se, however since it can be fixed by using nanosecond-precision real-time clock values instead of millisecond-precision and that would allow us to continue using the real-time clock in the test it is proposed to fix it this way.
```
java.lang.IllegalStateException: Thread.sleep was too short: 999 ms
```
It detects that a waiting call lasted for 999 ms instead of requested 1000 ms which is not in line with the specification. The failing methods differ (Thread.sleep, Thread.join, Thread.wait, ReentrantLock.tryLock...) but the actual waiting time is always 999 ms.
TimedWaitingTest uses non-monotonic `System.currentTimeMillis()` to measure the time which in general is not correct and may cause the failures because of negative time jumps. That is why as the first attempt
That would fix the failing test, however:
1. CRaC modifies `System.nanoTime()` to ensure it's monotonicity so the initial intention was to rely on some other time provider to test CRaC.
2. After further investigation it was revealed that the test fails even when `System.currentTimeMillis()` remains monotonic.
This is what really happens.
Some background:
- CRaC uses os::javaTimeMillis() to offset os::javaTimeNanos(), os::javaTimeMillis() reads nanosecond-precision time and then truncates it towards 0 to millisecond precision.
- On checkpoint CRaC reads the time: checkpoint_millis = os::javaTimeMillis().
- On restore it calculates the difference between the current time and the checkpoint time: diff_millis = os::javaTimeMillis() - checkpoint_millis.
- Then diff_millis * 1 000 000 is used to offset os::javaTimeNanos() to make it seem like roughly the same amount of monotonic time passed as the wall clock time.
Now suppose the following:
- Checkpoint time reading is executed at t = 1 999 999 ns, which is converted to 1 ms, so checkpoint_millis is 1.
- Restore is executed at t = 2 000 000 ns, which is converted to 2 ms, so diff_millis = 2 - 1 = 1.
This results in os::javaTimeNanos() being offset by 1 000 000 ns instead of just 1 ns which actually passed. This makes the “nano time” to be almost 1 ms ahead of the real time which makes the test fail. Such big difference is an extreme case which hasn't been witnessed in reality but the real difference of around 0.2–0.5 ms is enough to make the test fail.
Such behavior is not incorrect per-se, however since it can be fixed by using nanosecond-precision real-time clock values instead of millisecond-precision and that would allow us to continue using the real-time clock in the test it is proposed to fix it this way.
- relates to
-
JDK-8351402 [CRaC] Use System.nanoTime() in TimedWaitingTest
-
- Resolved
-
- links to
-
Commit(crac) openjdk/crac/ad63687e
-
Review(crac) openjdk/crac/221