Xing Qizheng (GitHub: MaxXSoft) will own this issue.
This patch changes the algorithm of `Node::dominates` to make the result more precise, and allows the iterators of `ConcurrentHashMap` to be scalar replaced.
The previous algorithm will return a conservative result when encountering a dead control flow, and only try the first two input paths of a multi-input Region node, which may prevent the scalar replacement in some cases.
For example, with G1 GC enabled, C2 generates GC barriers for `ConcurrentHashMap` iteration operations at some early phases, and then eliminates them in a later IGVN, but `LoadNode` is also idealized in the same IGVN. This causes `LoadNode::Ideal` to see some dead barrier control flows, and refuse to split some instance field loads through Phi due to the conservative result of `Node::dominates`, and thus the scalar replacement can not be applied to iterators in the later macro elimination phase.
This patch allows `Node::dominates` to try other paths of the last multi-input Region node when the first path is dead, and makes `ConcurrentHashMap` iteration ~30% faster:
```
Benchmark (nkeys) Mode Cnt Score Error Units
Maps.testConcurrentHashMapIterators 10000 avgt 15 414099.085 ± 33230.945 ns/op # baseline
Maps.testConcurrentHashMapIterators 10000 avgt 15 315490.281 ± 3037.056 ns/op # patch
```
This patch changes the algorithm of `Node::dominates` to make the result more precise, and allows the iterators of `ConcurrentHashMap` to be scalar replaced.
The previous algorithm will return a conservative result when encountering a dead control flow, and only try the first two input paths of a multi-input Region node, which may prevent the scalar replacement in some cases.
For example, with G1 GC enabled, C2 generates GC barriers for `ConcurrentHashMap` iteration operations at some early phases, and then eliminates them in a later IGVN, but `LoadNode` is also idealized in the same IGVN. This causes `LoadNode::Ideal` to see some dead barrier control flows, and refuse to split some instance field loads through Phi due to the conservative result of `Node::dominates`, and thus the scalar replacement can not be applied to iterators in the later macro elimination phase.
This patch allows `Node::dominates` to try other paths of the last multi-input Region node when the first path is dead, and makes `ConcurrentHashMap` iteration ~30% faster:
```
Benchmark (nkeys) Mode Cnt Score Error Units
Maps.testConcurrentHashMapIterators 10000 avgt 15 414099.085 ± 33230.945 ns/op # baseline
Maps.testConcurrentHashMapIterators 10000 avgt 15 315490.281 ± 3037.056 ns/op # patch
```
- links to
-
Commit(master) openjdk/jdk/965dd1ac
-
Review(master) openjdk/jdk23u/83
-
Review(master) openjdk/jdk/19496