-
Enhancement
-
Resolution: Fixed
-
P3
-
7
-
b150
-
sparc, sparc_64
-
generic
Issue | Fix Version | Assignee | Priority | Status | Resolution | Resolved In Build |
---|---|---|---|---|---|---|
JDK-8208890 | 8u201 | Muthusamy Chinnathambi | P3 | Resolved | Fixed | b01 |
JDK-8182730 | 8u192 | Muthusamy Chinnathambi | P3 | Resolved | Fixed | b01 |
JDK-8216710 | emb-8u201 | Unassigned | P3 | Resolved | Fixed | master |
Analysis by Prasad Vidhyabaskaran and Jeff Oplinger
Loads that follow BIS (Block Init Store) based allocation prefetches (which is the default on SPARC processors), suffer from Partial RAW (Read After Write) Hazards. Partial RAWs perform slower than Full RAW Hazards on T7 (C4 core). When SW Prefetches are used for allocation prefetch (-XX:AllocatePrefetchInstr=0) , the partial RAW hazard is eliminated and resulting full RAW hazards are handled more efficiently, thus improving performance.
Performance measurements were done using JMH testsuite (on SPARC T7), aurora tool (on SPARC T4), and stand alone SPECJBB2005 runs (on SPARC T7).
- 70% of JMH test cases showed improvements in the range of 1% to more than 4x, with 30% showing more than 5% gain. 28% of the tests regressed by less than 5%.
- Results from Aurora runs can be seen at the following link
http://aurora.se.oracle.com/performance/reporting/report/prasad.vidhyabaskaran.java_jvm_prefetch_flag_eval_solaris_sparc?mode=prasad.vidhyabaskaran.style3.instr1
These show improvements between 1% to 5% on most of the workloads, and a small regression on just the SPECjvm2008.serial workload.
- On SPECjbb2005, lower number of warehouse threads showed anywhere between 1 to 2.7% improvements and < 2% regression in peak warehouse step when memory bandwidth was exercised heavily.
It can be noted that performance gains are measurably larger than few regressing cases that were noted. Recommendation is to change the default prefetching choice to use SW prefetches (AllocatePrefetchInstr=0) on SPARC-T7 processors.
Loads that follow BIS (Block Init Store) based allocation prefetches (which is the default on SPARC processors), suffer from Partial RAW (Read After Write) Hazards. Partial RAWs perform slower than Full RAW Hazards on T7 (C4 core). When SW Prefetches are used for allocation prefetch (-XX:AllocatePrefetchInstr=0) , the partial RAW hazard is eliminated and resulting full RAW hazards are handled more efficiently, thus improving performance.
Performance measurements were done using JMH testsuite (on SPARC T7), aurora tool (on SPARC T4), and stand alone SPECJBB2005 runs (on SPARC T7).
- 70% of JMH test cases showed improvements in the range of 1% to more than 4x, with 30% showing more than 5% gain. 28% of the tests regressed by less than 5%.
- Results from Aurora runs can be seen at the following link
http://aurora.se.oracle.com/performance/reporting/report/prasad.vidhyabaskaran.java_jvm_prefetch_flag_eval_solaris_sparc?mode=prasad.vidhyabaskaran.style3.instr1
These show improvements between 1% to 5% on most of the workloads, and a small regression on just the SPECjvm2008.serial workload.
- On SPECjbb2005, lower number of warehouse threads showed anywhere between 1 to 2.7% improvements and < 2% regression in peak warehouse step when memory bandwidth was exercised heavily.
It can be noted that performance gains are measurably larger than few regressing cases that were noted. Recommendation is to change the default prefetching choice to use SW prefetches (AllocatePrefetchInstr=0) on SPARC-T7 processors.
- backported by
-
JDK-8182730 Use SW prefetch instructions instead of BIS for allocation prefetches on SPARC Core C4
- Resolved
-
JDK-8208890 Use SW prefetch instructions instead of BIS for allocation prefetches on SPARC Core C4
- Resolved
-
JDK-8216710 Use SW prefetch instructions instead of BIS for allocation prefetches on SPARC Core C4
- Resolved