-
Bug
-
Resolution: Fixed
-
P3
-
None
-
None
-
b145
A colleague reports:
-----------
(1) is an observable bug
and (2) is fragile code that only works accidentally; it could easily
cause bugs in future.
I'm attaching two patches:
- LocaleMatcher-bugfixonly.patch fixes (1)
- LocaleMatcher-bugfix-plus-regex-cleanup.patch fixes (1) and
also cleans up (2).
To verify that each patch fixes the issue (1), I have run the test
case below which demonstrates the bug on OpenJDK9. I have not run any
other tests or performed any other verification so my patch for (2)
may not be correct.
Test case that fails against OpenJDK9 but passes after the patch:
=== cut ===
public void testParseConsistency() {
assertEquals(Arrays.asList("ccq-aa", "ybd-aa", "rki-aa"), parseRange("ccq-aa"));
// Second invocation fails with:
// Expected :[ccq-aa, ybd-aa, rki-aa]
// Actual :[ccq-aa, ybd-aa-aa, rki-aa-aa]
assertEquals(Arrays.asList("ccq-aa", "ybd-aa", "rki-aa"), parseRange("ccq-aa"));
}
private List<String> parseRange(String s) {
return LocaleMatcher.parse(s).stream().map(LanguageRange::getRange).collect(Collectors.toList());
}
=== cut ===
===================================================
Issue (1):
LocaleMatcher.getEquivalentsForLanguage(String range) (line 346)
modifies the statically referenced LocaleEquivalentMaps.multiEquivsMap
which is intended to be constant. This is why LocaleMatcher behaves
differently in the second invocation from the test case.
Here's a code snippet of the buggy area in LocaleMatcher.java:
334 private static String[] getEquivalentsForLanguage(String range) {
335 String r = range;
336
337 while (r.length() > 0) {
338 if (LocaleEquivalentMaps.singleEquivMap.containsKey(r)) {
339 String equiv = LocaleEquivalentMaps.singleEquivMap.get(r);
340 // Return immediately for performance if the first matching
341 // subtag is found.
342 return new String[] {range.replaceFirst(r, equiv)};
343 } else if (LocaleEquivalentMaps.multiEquivsMap.containsKey(r)) {
344 String[] equivs = LocaleEquivalentMaps.multiEquivsMap.get(r);
345 for (int i = 0; i < equivs.length; i++) {
346 equivs[i] = range.replaceFirst(r, equivs[i]);
347 }
348 return equivs;
349 }
===================================================
Issue (2):
A few lines in LocaleMatcher.java (including the one with the bug)
call String.replaceFirst(a, b) for values a that are literal Strings,
thus they should be String.replaceFirst(Pattern.quote(a), b) or some
other literal String matching logic.
Other lines in LocaleMatcher use String.replaceAll() that could
be rewritten to faster and more readable String.replace().
Specifically:
- there are two occurrences of
range.replaceAll("\\x2A", "\\\\p{Alnum}*");
Because unicode codepoint # 0x2A is '*', this is equivalent to:
range.replace("*", "\\\\p{Alnum}*");
which is both more readable and faster.
- LocaleMatcher.filterTags() contains the line:
range = range.replaceAll("-[*]", "");
which I believe is exactly equivalent to the faster:
range = range.replace("-*", "");
- LocaleMatcher.getEquivalentForRegionAndVariant(String) in line 376
reads:
return range.replaceFirst(subtag, LocaleEquivalentMaps.regionVariantEquivMap.get(subtag));
but subtag is a literal string rather than a regex. This happens
to work because it will only consist of characters from [-a-z]
but this is fragile.
- LocaleMatcher.getEquivalentsForLanguage(String range) lines
342 and 346 call range.replaceFirst(r, ...). The value of r
here is again a literal string. It happens to work because r
contains only alphanumeric characters but it is fragile.
-----------
(1) is an observable bug
and (2) is fragile code that only works accidentally; it could easily
cause bugs in future.
I'm attaching two patches:
- LocaleMatcher-bugfixonly.patch fixes (1)
- LocaleMatcher-bugfix-plus-regex-cleanup.patch fixes (1) and
also cleans up (2).
To verify that each patch fixes the issue (1), I have run the test
case below which demonstrates the bug on OpenJDK9. I have not run any
other tests or performed any other verification so my patch for (2)
may not be correct.
Test case that fails against OpenJDK9 but passes after the patch:
=== cut ===
public void testParseConsistency() {
assertEquals(Arrays.asList("ccq-aa", "ybd-aa", "rki-aa"), parseRange("ccq-aa"));
// Second invocation fails with:
// Expected :[ccq-aa, ybd-aa, rki-aa]
// Actual :[ccq-aa, ybd-aa-aa, rki-aa-aa]
assertEquals(Arrays.asList("ccq-aa", "ybd-aa", "rki-aa"), parseRange("ccq-aa"));
}
private List<String> parseRange(String s) {
return LocaleMatcher.parse(s).stream().map(LanguageRange::getRange).collect(Collectors.toList());
}
=== cut ===
===================================================
Issue (1):
LocaleMatcher.getEquivalentsForLanguage(String range) (line 346)
modifies the statically referenced LocaleEquivalentMaps.multiEquivsMap
which is intended to be constant. This is why LocaleMatcher behaves
differently in the second invocation from the test case.
Here's a code snippet of the buggy area in LocaleMatcher.java:
334 private static String[] getEquivalentsForLanguage(String range) {
335 String r = range;
336
337 while (r.length() > 0) {
338 if (LocaleEquivalentMaps.singleEquivMap.containsKey(r)) {
339 String equiv = LocaleEquivalentMaps.singleEquivMap.get(r);
340 // Return immediately for performance if the first matching
341 // subtag is found.
342 return new String[] {range.replaceFirst(r, equiv)};
343 } else if (LocaleEquivalentMaps.multiEquivsMap.containsKey(r)) {
344 String[] equivs = LocaleEquivalentMaps.multiEquivsMap.get(r);
345 for (int i = 0; i < equivs.length; i++) {
346 equivs[i] = range.replaceFirst(r, equivs[i]);
347 }
348 return equivs;
349 }
===================================================
Issue (2):
A few lines in LocaleMatcher.java (including the one with the bug)
call String.replaceFirst(a, b) for values a that are literal Strings,
thus they should be String.replaceFirst(Pattern.quote(a), b) or some
other literal String matching logic.
Other lines in LocaleMatcher use String.replaceAll() that could
be rewritten to faster and more readable String.replace().
Specifically:
- there are two occurrences of
range.replaceAll("\\x2A", "\\\\p{Alnum}*");
Because unicode codepoint # 0x2A is '*', this is equivalent to:
range.replace("*", "\\\\p{Alnum}*");
which is both more readable and faster.
- LocaleMatcher.filterTags() contains the line:
range = range.replaceAll("-[*]", "");
which I believe is exactly equivalent to the faster:
range = range.replace("-*", "");
- LocaleMatcher.getEquivalentForRegionAndVariant(String) in line 376
reads:
return range.replaceFirst(subtag, LocaleEquivalentMaps.regionVariantEquivMap.get(subtag));
but subtag is a literal string rather than a regex. This happens
to work because it will only consist of characters from [-a-z]
but this is fragile.
- LocaleMatcher.getEquivalentsForLanguage(String range) lines
342 and 346 call range.replaceFirst(r, ...). The value of r
here is again a literal string. It happens to work because r
contains only alphanumeric characters but it is fragile.
- relates to
-
JDK-8170889 LocaleMatcher case conversion issues
-
- Closed
-