C2: Pre/Main/Post for foreach leaves never-executed post loop (AArch64) ======================================================================= In the foreach case, the disassembly shows: On foreach case disassembly we see: - a hot loop - a post loop that checks uncommon cases and can finish the loop if any iterations remain Two observations - The induction variable (i) is shared between the hot loop and the post loop. This keeps it live-out (forces a spill/rename) and makes the hot loop slightly less efficient. - In this example, the post loop is never executed; it is effectively dead code. ------------------------------------------------------------ // main hot counted loop for (; i < size; i++) { bh.consume(a[i]); } ------------------------------------------------------------ ;; B12: # out( B13 ) <- in( B13 ) top-of-loop Freq: 986885 0x000000010cf0e8b0: mov w13, w11 ;*getfield cursor {reexecute=0 rethrow=0 return_oop=0} ;; B13: # out( B12 B14 ) <- in( B10 B12 ) Loop( B13-B12 inner main of N65) Freq: 986886 0x000000010cf0e8b4: add x11, x10, w13, sxtw #2 0x000000010cf0e8b8: ldr w14, [x11, #0x10] 0x000000010cf0e8bc: lsl x11, x14, #3 ;*invokestatic consumeCompiler {reexecute=0 rethrow=0 return_oop=0} 0x000000010cf0e8c0: add w11, w13, #0x1 ;*iadd {reexecute=0 rethrow=0 return_oop=0} 0x000000010cf0e8c4: cmp w11, w12 0x000000010cf0e8c8: b.lt #-0x18 ;*ifeq {reexecute=0 rethrow=0 return_oop=0} ------------------------------------------------------------ // post-loop if (i < size) { if (i >= len) // B25/B26 → uncommon_trap("loop_limit_check"); if (modCount != l->modCount) // B22 → uncommon_trap("predicate"); // B17–B20 (inner post counted loop) do { bh.consume(a[i]); i++; } while (i < size); } ------------------------------------------------------------ ;; B14: # out( B21 B15 ) <- in( B23 B13 ) Freq: 0,999991 0x000000010cf0e8cc: cmp w11, w16 0x000000010cf0e8d0: b.ge #0x48 0x000000010cf0e8d4: cmp w11, w29 0x000000010cf0e8d8: b.hs #0xc0 0x000000010cf0e8dc: cmp w11, w29 0x000000010cf0e8e0: b.hs #0xc8 ;*getfield cursor {reexecute=0 rethrow=0 return_oop=0} 0x000000010cf0e8e4: cmp w11, w16 0x000000010cf0e8e8: b.ge #0x110 ;*if_icmplt {reexecute=0 rethrow=0 return_oop=0} 0x000000010cf0e8ec: add x12, x10, w11, sxtw #2 0x000000010cf0e8f0: ldr w14, [x12, #0x10] 0x000000010cf0e8f4: cmp w11, w29 0x000000010cf0e8f8: b.ge #0xd0 ;*if_icmplt {reexecute=0 rethrow=0 return_oop=0} 0x000000010cf0e8fc: lsl x12, x14, #3 ;*aaload {reexecute=0 rethrow=0 return_oop=0} 0x000000010cf0e900: add w14, w11, #0x1 ;*invokestatic consumeCompiler {reexecute=0 rethrow=0 return_oop=0} 0x000000010cf0e904: cmp w14, w16 0x000000010cf0e908: b.ge #0x10 ;*ifeq {reexecute=0 rethrow=0 return_oop=0} 0x000000010cf0e90c: mov w13, w11 0x000000010cf0e910: mov w11, w14 0x000000010cf0e914: b #-0x30 ;*return {reexecute=0 rethrow=0 return_oop=0} ------------------------------------------------------------ Experiment in HotSpot: disable pre–main–post split (e.g., force should_rce := false in iteration_split_impl) ==================================================================== Result: the post loop disappears, but the hot loop becomes less efficient (two per-iteration checks remain, blocking a clean counted loop and related optimizations): 29.15% 0x000000011040c340: mov w16, w14 0.53% 0x000000011040c344: mov w14, w13 ;*getfield cursor {reexecute=0 rethrow=0 return_oop=0} 0.83% 0x000000011040c348: cmp w14, w10 0.65% 0x000000011040c34c: b.ge #0x3c ;*if_icmplt {reexecute=0 rethrow=0 return_oop=0} 30.81% 0x000000011040c350: cmp w14, w15 0.41% 0x000000011040c354: b.ge #0x68 ;*if_icmplt {reexecute=0 rethrow=0 return_oop=0} 0x000000011040c358: add x12, x11, w14, sxtw #2 27.38% 0x000000011040c35c: ldr w12, [x12, #0x10] 0.41% 0x000000011040c360: add w13, w14, #0x1 ;*iadd {reexecute=0 rethrow=0 return_oop=0} 0.59% 0x000000011040c364: lsl x12, x12, #3 ;*invokestatic consumeCompiler {reexecute=0 rethrow=0 return_oop=0} 0.41% 0x000000011040c368: cmp w13, w10 0x000000011040c36c: b.lt #-0x2c ;*return {reexecute=0 rethrow=0 return_oop=0}