The Dacapo 'pmd' benchmark exercises classloading heavily with a user defined class loader (with heavy synchronization). During each iteration, it attempts to load >10000 classes.
During classloading, the bootstrap class loader in the delegation path calls SystemDictionary::load_instance_class to load a requested class. After the module system is initialized, the VM checks if the requested class' package is in a module defined to the boot loader. If the class is either in the unnamed package or unnamed module, or in a module not defined to the boot loader, the VM only searches the bootloader's append entry (or entries). As a simple optimization, if there is no boot append entry, we can immediately return NULL in that case without any additional work. This optimization improves Dacapo 'pmd' performance noticeably.
As 'pmd' performance results are quite volatile, following results are obtained using 'perf stat -r 10', which 10 runs. For each run, there are 10 iterations performed by 'pmd':
Before
====
101,286.60 msec task-clock:u # 7.219 CPUs utilized ( +- 1.17% )
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
840,178 page-faults:u # 8295.090 M/sec ( +- 3.70% )
223,026,170,830 cycles:u # 2201940.351 GHz ( +- 1.49% )
184,479,972,101 instructions:u # 0.83 insn per cycle ( +- 1.15% )
36,672,717,860 branches:u # 362070231.284 M/sec ( +- 1.16% )
668,876,585 branch-misses:u # 1.82% of all branches ( +- 1.11% )
14.0301 +- 0.0786 seconds time elapsed ( +- 0.56% )
After
===
97,250.43 msec task-clock:u # 7.159 CPUs utilized ( +- 1.21% )
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
817,109 page-faults:u # 8402.161 M/sec ( +- 3.90% )
211,539,841,061 cycles:u # 2175219.111 GHz ( +- 1.44% )
185,739,962,653 instructions:u # 0.88 insn per cycle ( +- 0.96% )
36,876,863,372 branches:u # 379196928.453 M/sec ( +- 0.85% )
671,447,151 branch-misses:u # 1.82% of all branches ( +- 0.70% )
13.5849 +- 0.0701 seconds time elapsed ( +- 0.52% )
With the optimization, there is ~3% improvement for total execution time.
During classloading, the bootstrap class loader in the delegation path calls SystemDictionary::load_instance_class to load a requested class. After the module system is initialized, the VM checks if the requested class' package is in a module defined to the boot loader. If the class is either in the unnamed package or unnamed module, or in a module not defined to the boot loader, the VM only searches the bootloader's append entry (or entries). As a simple optimization, if there is no boot append entry, we can immediately return NULL in that case without any additional work. This optimization improves Dacapo 'pmd' performance noticeably.
As 'pmd' performance results are quite volatile, following results are obtained using 'perf stat -r 10', which 10 runs. For each run, there are 10 iterations performed by 'pmd':
Before
====
101,286.60 msec task-clock:u # 7.219 CPUs utilized ( +- 1.17% )
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
840,178 page-faults:u # 8295.090 M/sec ( +- 3.70% )
223,026,170,830 cycles:u # 2201940.351 GHz ( +- 1.49% )
184,479,972,101 instructions:u # 0.83 insn per cycle ( +- 1.15% )
36,672,717,860 branches:u # 362070231.284 M/sec ( +- 1.16% )
668,876,585 branch-misses:u # 1.82% of all branches ( +- 1.11% )
14.0301 +- 0.0786 seconds time elapsed ( +- 0.56% )
After
===
97,250.43 msec task-clock:u # 7.159 CPUs utilized ( +- 1.21% )
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
817,109 page-faults:u # 8402.161 M/sec ( +- 3.90% )
211,539,841,061 cycles:u # 2175219.111 GHz ( +- 1.44% )
185,739,962,653 instructions:u # 0.88 insn per cycle ( +- 0.96% )
36,876,863,372 branches:u # 379196928.453 M/sec ( +- 0.85% )
671,447,151 branch-misses:u # 1.82% of all branches ( +- 0.70% )
13.5849 +- 0.0701 seconds time elapsed ( +- 0.52% )
With the optimization, there is ~3% improvement for total execution time.