Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8149973

Optimize object alignment check in debug builds.

    XMLWordPrintable

Details

    • Enhancement
    • Resolution: Fixed
    • P4
    • 9
    • 9
    • hotspot
    • None
    • gc
    • b112

    Description

      After experiments with profiling(by perf tool on linux) on fastdebug builds I noticed that sometimes the hottest method is oopDesc::decode_heap_oop_not_null(unsigned int) in src/share/vm/oops/oop.inline.hpp.

      Here a part of the report(only resolved symbols) for TestGCOld which was ran with "20 200 10 100 5000" arguments:
       17.92% java libjvm.so [.] oopDesc::decode_heap_oop_not_null(unsigned int)
        9.24% java libjvm.so [.] G1ParScanThreadState::copy_to_survivor_space(InCSetState, oop, markOopDesc*)
        5.53% java libjvm.so [.] G1ParScanThreadState::verify_ref(unsigned int*) const
      ...

      Annotation shows that the most of the time code spent in check_obj_alignment(oop obj) function which was in lined in oopDesc::decode_heap_oop_not_null.

      inline bool check_obj_alignment(oop obj) {
        return cast_from_oop<intptr_t>(obj) % MinObjAlignmentInBytes == 0;
      }

      oop oopDesc::decode_heap_oop_not_null(narrowOop v) {
        assert(!is_null(v), "narrow oop value can never be zero");
        address base = Universe::narrow_oop_base();
        int shift = Universe::narrow_oop_shift();
        oop result = (oop)(void*)((uintptr_t)base + ((uintptr_t)v << shift));
        assert(check_obj_alignment(result), "address not aligned: " INTPTR_FORMAT, p2i((void*) result));
        return result;
      }

      check_obj_alignment called in assert(and therefore only in debug build). It seems that division operation can be optimized in this function.
      MinObjAlignmentInBytes is initialzied to ObjectAlignmentInBytes in arguments.cpp and ObjectAlignmentInBytes must be a power of two. Therefore we can use '(cast_from_oop<intptr_t>(obj) & MinObjAlignmentInBytesMask) == 0' where MinObjAlignmentInBytesMask is equals to "MinObjAlignmentInBytes - 1'(also initialized in arguments.cpp).

      I.e. check_obj_alignment will looks like this:
      inline bool check_obj_alignment(oop obj) {
        return (cast_from_oop<intptr_t>(obj) & MinObjAlignmentInBytesMask) == 0;
      }

      I implemented that approach and in this case decode_heap_oop_not_null not more the hottest method:
       10.28% java libjvm.so [.] G1ParScanThreadState::copy_to_survivor_space(InCSetState, oop, markOopDesc*)
        8.52% java libjvm.so [.] oopDesc::decode_heap_oop_not_null(unsigned int)
        6.99% java libjvm.so [.] SpaceMangler::mangle_region(MemRegion)

      The similar optimizations can be performed in the following functions:
      1) G1UpdateRSOrPushRefOopClosure::do_oop_work (hotspot/src/share/vm/gc/g1/g1OopClosures.inline.hpp):
      template <class T>
      inline void G1UpdateRSOrPushRefOopClosure::do_oop_work(T* p) {
      ...
        assert((intptr_t)o % MinObjAlignmentInBytes == 0, "not oop aligned");
      ...
      }

      2) G1RemSet::par_write_ref (hotspot/src/share/vm/gc/g1/g1RemSet.inline.hpp):
      template <class T>
      inline void G1RemSet::par_write_ref(HeapRegion* from, T* p, uint tid) {
      ...
        assert((intptr_t)o % MinObjAlignmentInBytes == 0, "not oop aligned");
      ...
      }


      Suggested construction already used in MacroAssembler, e.g. hotspot/src/cpu/x86/vm/c1_MacroAssembler_x86.cpp):
      void C1_MacroAssembler::initialize_object(Register obj, Register klass, Register var_size_in_bytes, int con_size_in_bytes, Register t1, Register t2, bool is_tlab_allocated) {
        assert((con_size_in_bytes & MinObjAlignmentInBytesMask) == 0,
               "con_size_in_bytes is not multiple of alignment");
      ...
      }


      Overall I think that this should speed-up execution of the tests on fastdebug build.

      Attachments

        Activity

          People

            ddmitriev Dmitry Dmitriev
            ddmitriev Dmitry Dmitriev
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: