Consider the following frequent use case extracted from javac flags:
// $ find ./src/jdk.compiler/ -type f -print | xargs grep -e "flags_field &=" -e "flags_field |="
class BitSetAndReset {
private static long andq = 0x8000_0000_0000_0000L, orq = 0;
private static final long MASKq = 0x8000_0000_0000_0000L;
public static void main(String... args) {
testq();
if (andq != 0 || orq != 0x8000_0000_0000_0000L)
throw new AssertionError("64-bit logical operator failure");
}
public static void testq() {
andq &= ~MASKq;
orq |= MASKq;
}
}
Running it with:
$ java -XX:+PrintOptoAssembly -Xcomp -XX:-Inline -XX:CompileOnly=BitSetAndReset::testq BitSetAndReset
shows how it's currently assembled:
00c movq R10, #-9223372036854775808 # long
016 movq R11, java/lang/Class:exact * # ptr
020 orq [R11 + #112 (8-bit)], R10 # long ! Field: BitSetAndReset.orq
024 movq R10, #9223372036854775807 # long
02e andq [R11 + #104 (8-bit)], R10 # long ! Field: BitSetAndReset.andq
which could be improved using BTS/BTR instructions as:
00c movq R10, java/lang/Class:exact * # ptr
016 btrq [R10 + #104 (8-bit)], log2(not(#9223372036854775807)) # long ! Field: BitSetAndReset.andq
01c btsq [R10 + #112 (8-bit)], log2(#-9223372036854775808) # long ! Field: BitSetAndReset.orq
First of all, we see that only three instructions and one register instead of five instructions and two registers are necessary.
Then, the second benefit is a better instruction encoding as the 64-bit immediate mask is reduced to an 8-bit immediate index.
// $ find ./src/jdk.compiler/ -type f -print | xargs grep -e "flags_field &=" -e "flags_field |="
class BitSetAndReset {
private static long andq = 0x8000_0000_0000_0000L, orq = 0;
private static final long MASKq = 0x8000_0000_0000_0000L;
public static void main(String... args) {
testq();
if (andq != 0 || orq != 0x8000_0000_0000_0000L)
throw new AssertionError("64-bit logical operator failure");
}
public static void testq() {
andq &= ~MASKq;
orq |= MASKq;
}
}
Running it with:
$ java -XX:+PrintOptoAssembly -Xcomp -XX:-Inline -XX:CompileOnly=BitSetAndReset::testq BitSetAndReset
shows how it's currently assembled:
00c movq R10, #-9223372036854775808 # long
016 movq R11, java/lang/Class:exact * # ptr
020 orq [R11 + #112 (8-bit)], R10 # long ! Field: BitSetAndReset.orq
024 movq R10, #9223372036854775807 # long
02e andq [R11 + #104 (8-bit)], R10 # long ! Field: BitSetAndReset.andq
which could be improved using BTS/BTR instructions as:
00c movq R10, java/lang/Class:exact * # ptr
016 btrq [R10 + #104 (8-bit)], log2(not(#9223372036854775807)) # long ! Field: BitSetAndReset.andq
01c btsq [R10 + #112 (8-bit)], log2(#-9223372036854775808) # long ! Field: BitSetAndReset.orq
First of all, we see that only three instructions and one register instead of five instructions and two registers are necessary.
Then, the second benefit is a better instruction encoding as the 64-bit immediate mask is reduced to an 8-bit immediate index.