-
Enhancement
-
Resolution: Fixed
-
P4
-
9
-
b115
-
generic
-
generic
While investigating 'Free CSet' time by running a micro benchmark, we noticed that the time it uses increases when the number of parallel GC threads increases. Even though this phase is single threaded, profile shows that with more threads, the time spent in FromCardCache::clear increases significantly. Thought not shown in Solaris Studio profile, a VTune profile shows that the amount of memory cache misses are very high in that method.
Thomas Schatzl provided a patch to improve the memory access pattern by transposing the dimensions of the from card cache. This forces linear memory access (while clearing) that data structure for a particular region instead of previous strided access during that operation.
This shows a significant improvement of total Free CSet time on x86_64 Linux systems:
jdk build gcthreads avg free cset
base 16 36.477
patch 16 29.123
base 4 33.796
patch 4 30.053
Still waiting for Solaris Sparc results. There needs to be some more investigation on why this helps.
Thomas Schatzl provided a patch to improve the memory access pattern by transposing the dimensions of the from card cache. This forces linear memory access (while clearing) that data structure for a particular region instead of previous strided access during that operation.
This shows a significant improvement of total Free CSet time on x86_64 Linux systems:
jdk build gcthreads avg free cset
base 16 36.477
patch 16 29.123
base 4 33.796
patch 4 30.053
Still waiting for Solaris Sparc results. There needs to be some more investigation on why this helps.