In the attached program Foo.java, The C2-generated x86_64 code implements the line 'count++' with explicit store and load instructions and a register increment, where it could be implemented with a single memory increment instruction:
$ java -Xcomp -XX:-TieredCompilation -XX:CompileOnly=Foo::main -XX:+PrintOptoAssembly -XX:LoopUnrollLimit=0 Foo.java
(..)
movl RDX, [R10 + #112 (8-bit)]
(..)
incl RDX
movl [R10 + #112 (8-bit)], RDX
(..)
This could be implemented with the more compact form:
incl [R10 + #112 (8-bit)]
UPDATE: The result of 'count++' needs to be passed to 'println' in register RDX, hence using the proposed change would require a load instruction before the call to 'println':
movl RDX, [R10 + #112 (8-bit)]
This would still reduce code size slightly, but it is unclear whether it would improve performance at all.
The assembly-level CFG is attached for clarity.
$ java -Xcomp -XX:-TieredCompilation -XX:CompileOnly=Foo::main -XX:+PrintOptoAssembly -XX:LoopUnrollLimit=0 Foo.java
(..)
movl RDX, [R10 + #112 (8-bit)]
(..)
incl RDX
movl [R10 + #112 (8-bit)], RDX
(..)
This could be implemented with the more compact form:
incl [R10 + #112 (8-bit)]
UPDATE: The result of 'count++' needs to be passed to 'println' in register RDX, hence using the proposed change would require a load instruction before the call to 'println':
movl RDX, [R10 + #112 (8-bit)]
This would still reduce code size slightly, but it is unclear whether it would improve performance at all.
The assembly-level CFG is attached for clarity.