Current code in StringUTF16 does this:
@HotSpotIntrinsicCandidate
private static int compress(char[] src, int srcOff, byte[] dst, int dstOff, int len) {
for (int i = 0; i < len; i++) {
int c = src[srcOff++];
if (c >>> 8 != 0) {
return 0;
}
dst[dstOff++] = (byte)c;
}
return len;
}
@HotSpotIntrinsicCandidate
public static byte[] toBytes(char[] value, int off, int len) {
byte[] val = newBytesFor(len);
for (int i = 0; i < len; i++) {
putChar(val, i, value[off++]);
}
return val;
}
The entire method is replaced by C2 intrinsic, which does the vectorization. But, in C1, the generated code quality depends on Java code choices we make. For example, removing the redundant shifts from the codepath, and replacing it with proper comparison improves performance. Other code massaging might be in order.
@HotSpotIntrinsicCandidate
private static int compress(char[] src, int srcOff, byte[] dst, int dstOff, int len) {
for (int i = 0; i < len; i++) {
int c = src[srcOff++];
if (c >>> 8 != 0) {
return 0;
}
dst[dstOff++] = (byte)c;
}
return len;
}
@HotSpotIntrinsicCandidate
public static byte[] toBytes(char[] value, int off, int len) {
byte[] val = newBytesFor(len);
for (int i = 0; i < len; i++) {
putChar(val, i, value[off++]);
}
return val;
}
The entire method is replaced by C2 intrinsic, which does the vectorization. But, in C1, the generated code quality depends on Java code choices we make. For example, removing the redundant shifts from the codepath, and replacing it with proper comparison improves performance. Other code massaging might be in order.