-
Enhancement
-
Resolution: Fixed
-
P4
-
21, 23, 24
-
b22
Noticed this while looking at Leyden profiles. C2 seems to spend considerable time doing in this code:
void PhaseIdealLoop::Dominators() {
...
NTarjan *ntarjan = NEW_RESOURCE_ARRAY(NTarjan,C->unique()+1);
// Initialize _control field for fast reference
int i;
for( i= C->unique()-1; i>=0; i-- )
ntarjan[i]._control = nullptr;
The disassembly shows this loop is fairly hot. Replacing the initialization with memset, while touching more memory, is apparently faster. We seem to touch a lot (all?) these structs later on, so pulling them to cache with memset is likely "free".
void PhaseIdealLoop::Dominators() {
...
NTarjan *ntarjan = NEW_RESOURCE_ARRAY(NTarjan,C->unique()+1);
// Initialize _control field for fast reference
int i;
for( i= C->unique()-1; i>=0; i-- )
ntarjan[i]._control = nullptr;
The disassembly shows this loop is fairly hot. Replacing the initialization with memset, while touching more memory, is apparently faster. We seem to touch a lot (all?) these structs later on, so pulling them to cache with memset is likely "free".
- links to
-
Commit(master) openjdk/jdk/e659d9da
-
Review(master) openjdk/jdk/21690