Loading...

XML

Word

Printable

Type: Bug
Resolution: Fixed
Priority: P3
Fix Version/s: 9
Affects Version/s: None
Component/s: core-libs
Labels:
- autoverify
- jsr379-annex1

Subcomponent:
java.lang
Resolved In Build:
b89
Verification:
Verified

Issue	Fix Version	Assignee	Priority	Status	Resolution	Resolved In Build
JDK-8142810	emb-9	Brent Christian	P3	Resolved	Fixed	team

The spec for String.equalsIgnoreCase() and String.regionMatches(boolean ignoreCase, ...) does not match what the code does.

From the equalsIgnoreCase() JavaDoc:
--
"Two characters c1 and c2 are considered the same ignoring case if at least one of the following is true:
    The two characters are the same (as compared by the == operator)
    Applying the method Character.toUpperCase(char) to each character produces the same result
    Applying the method Character.toLowerCase(char) to each character produces the same result"
--

From regionMatches(boolean ignoreCase, ...):
--
"The result is {@code false} if and only if at least one of the following is true:
...
ignoreCase is true and there is some nonnegative integer k less than len such that:
     Character.toLowerCase(this.charAt(toffset+k)) !=
         Character.toLowerCase(other.charAt(ooffset+k))
and:
     Character.toUpperCase(this.charAt(toffset+k)) !=
             Character.toUpperCase(other.charAt(ooffset+k))"
--

These methods compare Strings one character at a time. The stated procedure for ignoring case is to call toUpperCase() and toLowerCase() for each character in the Strings, and compare the respective results.

However, the code does something slightly different. From regionMatches():
  if (ignoreCase) {
      // If characters don't match but case may be ignored,
      // try converting both characters to uppercase.
      // If the results match, then the comparison scan should
      // continue.
      char u1 = Character.toUpperCase(c1);
      char u2 = Character.toUpperCase(c2);
      if (u1 == u2) {
          continue;
      }
      // Unfortunately, conversion to uppercase does not work properly
      // for the Georgian alphabet, which has strange rules about case
      // conversion. So we need to make one last check before
      // exiting.
      if (Character.toLowerCase(u1) == Character.toLowerCase(u2)) {
          continue;
      }
  }

After comparing the result of toUpperCase(), toLowerCase() is called not on the original characters, but *on the result of toUpperCase()*.

I've not found a specific reason for calling toLowerCase() with the result of toUpperCase(), instead of with the original character (beyond the "Georgian alphabet" comment). But the code has worked like this since JDK 1.0.2, and is consistent with String.compareToIgnoreCase(), added in JDK 1.2.

I presume we did the best we could with the Unicode rules of the time. The long-standing behavior should be maintained for compatibility. Unicode's case mapping rules have evolved over time (addition of SpecialCasing and CaseFolding), as has the Unicode support in the JDK (addition of facilities for context- and locale-aware text handling in java.text).

Over the years, bugs (e.g. ~~JDK-4146417~~, ~~JDK-4120540~~) have popped up questioning the Character.toLowerCase(Character.toUpperCase(char)) approach used by equalsIgnoreCase/regionMatches/compareToIgnoreCase. They were all determined to be "Not an Issue". Where the String API does not account for locale/language as people would want or expect, the answer has been to use locale-sensitive API (java.text, specifically Collator - ~~JDK-4204589~~, ~~JDK-4425387~~, ~~JDK-4120540~~).

A JavaDoc update for equalsIgnoreCase() and regionMatches() is in order, to something along the lines of String.compareToIgnoreCase():
"...with normalized versions of the strings where case differences have been eliminated by calling Character.toLowerCase(Character.toUpperCase(character)) on each character."

It would also be worth adding references to java.text.Collator.

backported by

JDK-8142810 java.lang.String: spec doesn't match impl when ignoring case - equalsIgnoreCase(), regionMatches()

Resolved

Assignee:: Brent Christian

Reporter:: Brent Christian

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2015-10-02 16:42

Updated:: 2017-07-10 23:41

Resolved:: 2015-10-27 09:21

Details

Backports

Description

Attachments

Issue Links

Activity

People

Dates