-
Enhancement
-
Resolution: Fixed
-
P4
-
repo-lilliput
Klass decode depends on several runtime parameters. In 64-bit header mode, these are:
- UseCompactObjectHeaders
- Encoding Base
- Encoding Shift
In Legacy header mode, these are:
- UseCompactObjectHeaders
- UseCompressedClassPointers
- Encoding Base
- Encoding Shift
These values are stored at distinct locations and require three resp. four loads (see disassmbly [1]). Unfortunately, Legacy mode is made more expensive, since we now need to load two switches.
I would like to minimize the number of loads. There are several ways to do this, but the most simple would be to use a denser representation in memory of these values. We always load them together anyway.
All four values (UseCompactObjectHeaders, UseCompressedClassPointers, Encoding Base+Shift) can be coded into a single 64-bit value. The encoding base will always be page-aligned. That leaves us an alignment shadow of 12 bits to hide all the rest of the information. UseCompactObjectHeaders and UseCompressedClassPointers can be represented by single bits. Encoding shift will not be larger than 31 (not planned so far at least), so we can store the shift in 5 bits.
The result is that the three resp. four loads can be folded into a single 64-bit load without too much trouble.
Alternatives.
- Generating a stub routine is not an option if one wants to keep decoding inlined
- We could generate different variants of the decoding routines via template, parametrized for each permutation of (shift, UseCompressedClassPointers, UseCompactObjectHeaders) and the most common encoding base. But we would need a different solution for uncommon base addresses and would need to decide, at runtime, which code variant to use. Which, again, introduces a runtime switch to query. So, nothing gained compared with the proposed solution.
[1]https://bugs.openjdk.org/secure/attachment/103772/KlassExtraction.txt
- UseCompactObjectHeaders
- Encoding Base
- Encoding Shift
In Legacy header mode, these are:
- UseCompactObjectHeaders
- UseCompressedClassPointers
- Encoding Base
- Encoding Shift
These values are stored at distinct locations and require three resp. four loads (see disassmbly [1]). Unfortunately, Legacy mode is made more expensive, since we now need to load two switches.
I would like to minimize the number of loads. There are several ways to do this, but the most simple would be to use a denser representation in memory of these values. We always load them together anyway.
All four values (UseCompactObjectHeaders, UseCompressedClassPointers, Encoding Base+Shift) can be coded into a single 64-bit value. The encoding base will always be page-aligned. That leaves us an alignment shadow of 12 bits to hide all the rest of the information. UseCompactObjectHeaders and UseCompressedClassPointers can be represented by single bits. Encoding shift will not be larger than 31 (not planned so far at least), so we can store the shift in 5 bits.
The result is that the three resp. four loads can be folded into a single 64-bit load without too much trouble.
Alternatives.
- Generating a stub routine is not an option if one wants to keep decoding inlined
- We could generate different variants of the decoding routines via template, parametrized for each permutation of (shift, UseCompressedClassPointers, UseCompactObjectHeaders) and the most common encoding base. But we would need a different solution for uncommon base addresses and would need to decide, at runtime, which code variant to use. Which, again, introduces a runtime switch to query. So, nothing gained compared with the proposed solution.
[1]https://bugs.openjdk.org/secure/attachment/103772/KlassExtraction.txt