-
Bug
-
Resolution: Fixed
-
P2
-
9, 11, 17, 18, 19
-
b05
Issue | Fix Version | Assignee | Priority | Status | Resolution | Resolved In Build |
---|---|---|---|---|---|---|
JDK-8280636 | 18.0.1 | Claes Redestad | P2 | Resolved | Fixed | b04 |
JDK-8279962 | 18 | Claes Redestad | P2 | Resolved | Fixed | b32 |
JDK-8279985 | 17.0.3-oracle | Dukebot | P2 | Closed | Fixed | b03 |
JDK-8280095 | 17.0.3 | Goetz Lindenmaier | P2 | Resolved | Fixed | b01 |
JDK-8280039 | 11.0.15-oracle | Vladimir Kozlov | P2 | Closed | Fixed | b03 |
JDK-8280701 | 11.0.15 | Goetz Lindenmaier | P2 | Resolved | Fixed | b01 |
```
diff --git a/src/java.base/share/classes/java/lang/String.java b/src/java.base/share/classes/java/lang/String.java
index abb35ebaeb1..f84d60f92cc 100644
--- a/src/java.base/share/classes/java/lang/String.java
+++ b/src/java.base/share/classes/java/lang/String.java
@@ -1284,14 +1284,17 @@ public final class String
int sp = 0;
int sl = val.length >> 1;
byte[] dst = new byte[sl * 3];
- char c;
- while (sp < sl && (c = StringUTF16.getChar(val, sp)) < '\u0080') {
+ while (sp < sl) {
+ char c = StringUTF16.getChar(val, sp);
+ if (c >= '\u0080') {
+ break;
+ }
// ascii fast loop;
dst[dp++] = (byte)c;
sp++;
}
while (sp < sl) {
- c = StringUTF16.getChar(val, sp++);
+ char c = StringUTF16.getChar(val, sp++);
if (c < 0x80) {
dst[dp++] = (byte)c;
} else if (c < 0x800) {
```
Results on a few micros I'm updating to better stress this code --
Baseline:
```
Benchmark (charsetName) Mode Cnt Score Error Units
StringEncode.WithCharset.encodeUTF16 UTF-8 avgt 15 171.853 ± 10.275 ns/op
StringEncode.WithCharset.encodeUTF16LongEnd UTF-8 avgt 15 1991.586 ± 82.234 ns/op
StringEncode.WithCharset.encodeUTF16LongStart UTF-8 avgt 15 8422.458 ± 473.161 ns/op
```
Patch:
```
Benchmark (charsetName) Mode Cnt Score Error Units
StringEncode.WithCharset.encodeUTF16 UTF-8 avgt 15 128.525 ± 6.573 ns/op
StringEncode.WithCharset.encodeUTF16LongEnd UTF-8 avgt 15 1843.455 ± 72.984 ns/op
StringEncode.WithCharset.encodeUTF16LongStart UTF-8 avgt 15 4124.791 ± 308.683 ns/op
```
Going back, this seem to have been an issue with this code since its inception with JEP 254 in JDK 9.
The micro encodeUTF16LongEnd encodes a longer string which is mostly ASCII but with an non-ASCII codepoint at the end. This exaggerates the usefulness of the ascii loop. encodeUTF16LongStart tests the same string but with the non-ASCII codepoint moved to the front. This stresses the non-ascii loop. We see that the patch above helps in general, but mainly improves the microbenchmark that spends its time in the second loop.
There's likely a compiler bug hiding in plain sight here where the potentially uninitialized local `char c` messes up the loop optimization of the second loop. I think the above patch is reasonable to put back into the JDK while we investigate if/how C2 can better handle this pattern.
- backported by
-
JDK-8279962 Loop optimization issue in String.encodeUTF8_UTF16
- Resolved
-
JDK-8280095 Loop optimization issue in String.encodeUTF8_UTF16
- Resolved
-
JDK-8280636 Loop optimization issue in String.encodeUTF8_UTF16
- Resolved
-
JDK-8280701 Loop optimization issue in String.encodeUTF8_UTF16
- Resolved
-
JDK-8279985 Loop optimization issue in String.encodeUTF8_UTF16
- Closed
-
JDK-8280039 Loop optimization issue in String.encodeUTF8_UTF16
- Closed
- relates to
-
JDK-8054307 JEP 254: Compact Strings
- Closed
-
JDK-8279888 Local variable independently used by multiple loops can interfere with loop optimizations
- Resolved
- links to
-
Commit openjdk/jdk11u-dev/84ed9671
-
Commit openjdk/jdk17u-dev/69d296d4
-
Commit openjdk/jdk18/ff856593
-
Commit openjdk/jdk/c3d0a940
-
Review openjdk/jdk11u-dev/791
-
Review openjdk/jdk17u-dev/97
-
Review openjdk/jdk18/99
-
Review openjdk/jdk/7026