A DESCRIPTION OF THE REQUEST :
The current implementation of String.equals() causes a character by character comparison to occur needlessly. It doesn't take advantage of several simple tests. Here's a list of tests:
* Identity (already included)
* Count (already included)
* Offset and Value
* Hash Codes
The Identity and Count tests are already included in the code and should be continued to be used.
The Offset and Value comparison checks the offsets and values of both strings. If they are equal then these two strings point to the same buffer and use the same contents. Hence, they must be equal. If the JVM optimizes String memory usage by forcing equal Strings to use the same buffer, then this comparison will short-circuit very frequently.
The Hash Codes comparison checks the hash codes of both strings. If they are not equal then by definition, the strings can't be equal. The source code provided forces a calculation of the hash code. If the string is compared twice then this calculation pays off. If the string is only compared once during its existence, then this calculation is wasted.
If forcing the hash code calculation proves to be too expensive, a cheaper route is to use the following logic:
if ((hash != 0) && (anotherString.hash != 0) && (hash != anotherString.hash))
return(false);
JUSTIFICATION :
Comparing every character in two Strings is expensive especially as the Strings get longer. The Offsets and Values comparison requires two comparisons which is very cheap. The first time the Hash Codes comparison checks are done, the hash codes have to be computed. However, every comparison thereafter benefits in speed. The speed will be noticed when the Strings are equal length but the last character is different.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Improved performance in String.equals
---------- BEGIN SOURCE ----------
public boolean equals(Object anObject)
{
String anotherString;
int n, offset1, offset2;
char value1[], value2[];
if (this == anObject)
return(true); // No-brainer equals test.
if (!(anObject instanceof String))
return(false); // Can't be equal if the other object is not a String.
anotherString = (String) anObject;
n = count;
if (n != anotherString.count)
return(false); // Can't be equal if the lengths don't match.
offset1 = offset;
offset2 = anotherString.offset;
value1 = value;
value2 = anotherString.value;
if ((offset1 == offset2) && (value1 == value2))
return(true); // If both Strings use the same buffer and offet, then they are equal.
if (hashCode() != anotherString.hashCode())
return(false); // By definition, can't be equal if the hash codes don't match.
while (n-- != 0)
if (value1[offset1++] != value2[offset2++])
return(false);
return(true);
}
---------- END SOURCE ----------
The current implementation of String.equals() causes a character by character comparison to occur needlessly. It doesn't take advantage of several simple tests. Here's a list of tests:
* Identity (already included)
* Count (already included)
* Offset and Value
* Hash Codes
The Identity and Count tests are already included in the code and should be continued to be used.
The Offset and Value comparison checks the offsets and values of both strings. If they are equal then these two strings point to the same buffer and use the same contents. Hence, they must be equal. If the JVM optimizes String memory usage by forcing equal Strings to use the same buffer, then this comparison will short-circuit very frequently.
The Hash Codes comparison checks the hash codes of both strings. If they are not equal then by definition, the strings can't be equal. The source code provided forces a calculation of the hash code. If the string is compared twice then this calculation pays off. If the string is only compared once during its existence, then this calculation is wasted.
If forcing the hash code calculation proves to be too expensive, a cheaper route is to use the following logic:
if ((hash != 0) && (anotherString.hash != 0) && (hash != anotherString.hash))
return(false);
JUSTIFICATION :
Comparing every character in two Strings is expensive especially as the Strings get longer. The Offsets and Values comparison requires two comparisons which is very cheap. The first time the Hash Codes comparison checks are done, the hash codes have to be computed. However, every comparison thereafter benefits in speed. The speed will be noticed when the Strings are equal length but the last character is different.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Improved performance in String.equals
---------- BEGIN SOURCE ----------
public boolean equals(Object anObject)
{
String anotherString;
int n, offset1, offset2;
char value1[], value2[];
if (this == anObject)
return(true); // No-brainer equals test.
if (!(anObject instanceof String))
return(false); // Can't be equal if the other object is not a String.
anotherString = (String) anObject;
n = count;
if (n != anotherString.count)
return(false); // Can't be equal if the lengths don't match.
offset1 = offset;
offset2 = anotherString.offset;
value1 = value;
value2 = anotherString.value;
if ((offset1 == offset2) && (value1 == value2))
return(true); // If both Strings use the same buffer and offet, then they are equal.
if (hashCode() != anotherString.hashCode())
return(false); // By definition, can't be equal if the hash codes don't match.
while (n-- != 0)
if (value1[offset1++] != value2[offset2++])
return(false);
return(true);
}
---------- END SOURCE ----------
- duplicates
-
JDK-6932808 (str) Tune equals() of String
-
- Open
-
- relates to
-
JDK-6912520 String#equals(Object) should benefit from hash code
-
- Closed
-