-
Enhancement
-
Resolution: Fixed
-
P4
-
6
-
b03
-
x86
-
windows_xp
-
Not verified
A DESCRIPTION OF THE REQUEST :
The replaceAll method was changed so that it doesn't create a StringBuffer when no actual replacements need to be made. There are other instances of unnecessary object creation as well, starting with the replaceFirst method, which should be changed in the same way replaceAll was. But the real offender is the appendReplacement method, which creates another StringBuffer and potentially several Strings every time it's called, when it shouldn't be creating any objects at all.
Currently, appendReplacement consists of three steps. First, it processes
the "replacement" parameter to generate the replacement text, creating a new
StringBuffer for the purpose (even if the replacement string is empty!).
Next, it appends the intervening (non-matched) text to the StringBuffer
("sb") that was passed into it. Finally, it appends the generated text to
"sb". If it were to append the intervening text *first*, it could then use
the passed-in StringBuffer to build the replacement text, eliminating the
need to create another one.
Whenever it encounters a "$g" group reference in the replacement string,
appendReplacement retrieves the captured text by means of the group(int)
method, which always returns a String. If the target text is a String this
won't matter much, but with any other CharSequence implementation, creating
those group strings can get pretty expensive. Instead of appending a String
copy of the captured text, I suggest using the append(CharSequence, int, int)
method that was added to StringBuffer in jdk1.5. (It can also be used in the
"append intervening" step and the appendTail method, to improve readability.)
Here are the affected methods after the proposed changes:
//==== code ========================================================
public Matcher appendReplacement(StringBuffer sb, String replacement) {
// If no match, return error
if (first < 0)
throw new IllegalStateException("No match available");
// Append the intervening text
sb.append(text, lastAppendPosition, first);
// Process substitution string to replace group references with groups
int cursor = 0;
while (cursor < replacement.length()) {
char nextChar = replacement.charAt(cursor);
if (nextChar == '\\') {
cursor++;
nextChar = replacement.charAt(cursor);
sb.append(nextChar);
cursor++;
} else if (nextChar == '$') {
// Skip past $
cursor++;
// The first number is always a group
int refNum = (int)replacement.charAt(cursor) - '0';
if ((refNum < 0)||(refNum > 9))
throw new IllegalArgumentException(
"Illegal group reference");
cursor++;
// Capture the largest legal group string
boolean done = false;
while (!done) {
if (cursor >= replacement.length()) {
break;
}
int nextDigit = replacement.charAt(cursor) - '0';
if ((nextDigit < 0)||(nextDigit > 9)) { // not a number
break;
}
int newRefNum = (refNum * 10) + nextDigit;
if (groupCount() < newRefNum) {
done = true;
} else {
refNum = newRefNum;
cursor++;
}
}
// Append group text without creating a new String
if (start(refNum) != -1)
sb.append(text, start(refNum), end(refNum));
} else {
sb.append(nextChar);
cursor++;
}
}
lastAppendPosition = last;
return this;
}
public StringBuffer appendTail(StringBuffer sb) {
sb.append(text, lastAppendPosition, text.length());
return sb;
}
//==================================================================
public String replaceFirst(String replacement) {
if (replacement == null)
throw new NullPointerException("replacement");
reset();
if (find()) {
StringBuffer sb = new StringBuffer();
appendReplacement(sb, replacement);
appendTail(sb);
return sb.toString();
}
return text.toString();
}
JUSTIFICATION :
In most cases, the proposed changes won't have a noticeable effect, but in
a high-volume process involving many substitutions, I believe the time
savings would be substantial. The changes don't affect the API, and the risk
of introducing bugs is minimal.
###@###.### 2005-2-23 04:29:01 GMT
The replaceAll method was changed so that it doesn't create a StringBuffer when no actual replacements need to be made. There are other instances of unnecessary object creation as well, starting with the replaceFirst method, which should be changed in the same way replaceAll was. But the real offender is the appendReplacement method, which creates another StringBuffer and potentially several Strings every time it's called, when it shouldn't be creating any objects at all.
Currently, appendReplacement consists of three steps. First, it processes
the "replacement" parameter to generate the replacement text, creating a new
StringBuffer for the purpose (even if the replacement string is empty!).
Next, it appends the intervening (non-matched) text to the StringBuffer
("sb") that was passed into it. Finally, it appends the generated text to
"sb". If it were to append the intervening text *first*, it could then use
the passed-in StringBuffer to build the replacement text, eliminating the
need to create another one.
Whenever it encounters a "$g" group reference in the replacement string,
appendReplacement retrieves the captured text by means of the group(int)
method, which always returns a String. If the target text is a String this
won't matter much, but with any other CharSequence implementation, creating
those group strings can get pretty expensive. Instead of appending a String
copy of the captured text, I suggest using the append(CharSequence, int, int)
method that was added to StringBuffer in jdk1.5. (It can also be used in the
"append intervening" step and the appendTail method, to improve readability.)
Here are the affected methods after the proposed changes:
//==== code ========================================================
public Matcher appendReplacement(StringBuffer sb, String replacement) {
// If no match, return error
if (first < 0)
throw new IllegalStateException("No match available");
// Append the intervening text
sb.append(text, lastAppendPosition, first);
// Process substitution string to replace group references with groups
int cursor = 0;
while (cursor < replacement.length()) {
char nextChar = replacement.charAt(cursor);
if (nextChar == '\\') {
cursor++;
nextChar = replacement.charAt(cursor);
sb.append(nextChar);
cursor++;
} else if (nextChar == '$') {
// Skip past $
cursor++;
// The first number is always a group
int refNum = (int)replacement.charAt(cursor) - '0';
if ((refNum < 0)||(refNum > 9))
throw new IllegalArgumentException(
"Illegal group reference");
cursor++;
// Capture the largest legal group string
boolean done = false;
while (!done) {
if (cursor >= replacement.length()) {
break;
}
int nextDigit = replacement.charAt(cursor) - '0';
if ((nextDigit < 0)||(nextDigit > 9)) { // not a number
break;
}
int newRefNum = (refNum * 10) + nextDigit;
if (groupCount() < newRefNum) {
done = true;
} else {
refNum = newRefNum;
cursor++;
}
}
// Append group text without creating a new String
if (start(refNum) != -1)
sb.append(text, start(refNum), end(refNum));
} else {
sb.append(nextChar);
cursor++;
}
}
lastAppendPosition = last;
return this;
}
public StringBuffer appendTail(StringBuffer sb) {
sb.append(text, lastAppendPosition, text.length());
return sb;
}
//==================================================================
public String replaceFirst(String replacement) {
if (replacement == null)
throw new NullPointerException("replacement");
reset();
if (find()) {
StringBuffer sb = new StringBuffer();
appendReplacement(sb, replacement);
appendTail(sb);
return sb.toString();
}
return text.toString();
}
JUSTIFICATION :
In most cases, the proposed changes won't have a noticeable effect, but in
a high-volume process involving many substitutions, I believe the time
savings would be substantial. The changes don't affect the API, and the risk
of introducing bugs is minimal.
###@###.### 2005-2-23 04:29:01 GMT
- relates to
-
JDK-8291598 Matcher.appendReplacement should not create new StringBuilder instances
- Resolved