Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8231630

Optimize boot loader with no bootclasspath append entry

XMLWordPrintable

    • b18

      The Dacapo 'pmd' benchmark exercises classloading heavily with a user defined class loader (with heavy synchronization). During each iteration, it attempts to load >10000 classes.

      During classloading, the bootstrap class loader in the delegation path calls SystemDictionary::load_instance_class to load a requested class. After the module system is initialized, the VM checks if the requested class' package is in a module defined to the boot loader. If the class is either in the unnamed package or unnamed module, or in a module not defined to the boot loader, the VM only searches the bootloader's append entry (or entries). As a simple optimization, if there is no boot append entry, we can immediately return NULL in that case without any additional work. This optimization improves Dacapo 'pmd' performance noticeably.

      As 'pmd' performance results are quite volatile, following results are obtained using 'perf stat -r 10', which 10 runs. For each run, there are 10 iterations performed by 'pmd':

      Before
      ====
      101,286.60 msec task-clock:u              #    7.219 CPUs utilized            ( +-  1.17% )
                       0      context-switches:u        #    0.000 K/sec                  
                       0      cpu-migrations:u          #    0.000 K/sec                  
                 840,178      page-faults:u             # 8295.090 M/sec                    ( +-  3.70% )
         223,026,170,830      cycles:u                  # 2201940.351 GHz                   ( +-  1.49% )
         184,479,972,101      instructions:u            #    0.83  insn per cycle           ( +-  1.15% )
          36,672,717,860      branches:u                # 362070231.284 M/sec               ( +-  1.16% )
             668,876,585      branch-misses:u           #    1.82% of all branches          ( +-  1.11% )

                 14.0301 +- 0.0786 seconds time elapsed  ( +-  0.56% )

      After
      ===
      97,250.43 msec task-clock:u              #    7.159 CPUs utilized            ( +-  1.21% )
                       0      context-switches:u        #    0.000 K/sec                  
                       0      cpu-migrations:u          #    0.000 K/sec                  
                 817,109      page-faults:u             # 8402.161 M/sec                    ( +-  3.90% )
         211,539,841,061      cycles:u                  # 2175219.111 GHz                   ( +-  1.44% )
         185,739,962,653      instructions:u            #    0.88  insn per cycle           ( +-  0.96% )
          36,876,863,372      branches:u                # 379196928.453 M/sec               ( +-  0.85% )
             671,447,151      branch-misses:u           #    1.82% of all branches          ( +-  0.70% )

                 13.5849 +- 0.0701 seconds time elapsed  ( +-  0.52% )

      With the optimization, there is ~3% improvement for total execution time.

            jiangli Jiangli Zhou
            jiangli Jiangli Zhou
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: