A DESCRIPTION OF THE PROBLEM :
Currently, String.codePointCount has only an overload that takes start and end indices. However, developers will expect another overload without arguments to count the code points in the entire string.
Indeed there are some workaround now:
1. str.codePoints().count()
2. str.codePointCount(0, str.length())
However, 1. has extra process (yielding every code point in string), and 2. (1) requires us to assign the string to a variable once, (2) makes the source code more verbose, and (3) has an extra boundary check (https://github.com/openjdk/jdk/blob/0735dc27c71de46896afd2f0f608319304a3d549/src/java.base/share/classes/java/lang/String.java#L1698C9-L1698C66).
=========
Use cases:
=========
if (userName.codePointCount() > 20) {
IO.println("The user name is too long to store in VARCHAR(20) in utf8mb4 MySQL!");
}
// https://pages.nist.gov/800-63-4/sp800-63b.html#passwordver
// Verifiers and CSPs SHALL require passwords to be a minimum of eight characters in length
// Each Unicode code point SHALL be counted as a single character when evaluating password length.
if (password.codePointCount() < 8) {
IO.println("Password is too short!");
}
ACTUAL BEHAVIOR :
| Welcome to JShell -- Version 26-ea
| For an introduction type: /help intro
jshell> var str = "𰻞𰻞麺";
str ==> "𰻞𰻞麺"
jshell> str.codePointCount()
| Error:
| method codePointCount in class java.lang.String cannot be applied to given types;
| required: int,int
| found: no arguments
| reason: actual and formal argument lists differ in length
| jshell> str.codePointCount()
| ^----------------^
jshell> str.codePointCount(
Signatures:
int String.codePointCount(int beginIndex, int endIndex)
<press tab again to see documentation>
jshell> jshell> str.codePointCount(
int String.codePointCount(int beginIndex, int endIndex)
Returns the number of Unicode code points in the specified text range of this String .The text
range begins at the specified beginIndex and extends to the char at index endIndex - 1 . Thus
the length (in char s) of the text range is endIndex-beginIndex . Unpaired surrogates within
the text range count as one code point each.
Parameters:
beginIndex - the index to the first char of the text range.
endIndex - the index after the last char of the text range.
Returns:
the number of Unicode code points in the specified text range
Thrown Exceptions:
IndexOutOfBoundsException - if the beginIndex is negative, or endIndex is larger than the
length of this String , or beginIndex is larger than endIndex .
<press tab again to see all possible completions; total possible completions: 1,408>
Currently, String.codePointCount has only an overload that takes start and end indices. However, developers will expect another overload without arguments to count the code points in the entire string.
Indeed there are some workaround now:
1. str.codePoints().count()
2. str.codePointCount(0, str.length())
However, 1. has extra process (yielding every code point in string), and 2. (1) requires us to assign the string to a variable once, (2) makes the source code more verbose, and (3) has an extra boundary check (https://github.com/openjdk/jdk/blob/0735dc27c71de46896afd2f0f608319304a3d549/src/java.base/share/classes/java/lang/String.java#L1698C9-L1698C66).
=========
Use cases:
=========
if (userName.codePointCount() > 20) {
IO.println("The user name is too long to store in VARCHAR(20) in utf8mb4 MySQL!");
}
// https://pages.nist.gov/800-63-4/sp800-63b.html#passwordver
// Verifiers and CSPs SHALL require passwords to be a minimum of eight characters in length
// Each Unicode code point SHALL be counted as a single character when evaluating password length.
if (password.codePointCount() < 8) {
IO.println("Password is too short!");
}
ACTUAL BEHAVIOR :
| Welcome to JShell -- Version 26-ea
| For an introduction type: /help intro
jshell> var str = "𰻞𰻞麺";
str ==> "𰻞𰻞麺"
jshell> str.codePointCount()
| Error:
| method codePointCount in class java.lang.String cannot be applied to given types;
| required: int,int
| found: no arguments
| reason: actual and formal argument lists differ in length
| jshell> str.codePointCount()
| ^----------------^
jshell> str.codePointCount(
Signatures:
int String.codePointCount(int beginIndex, int endIndex)
<press tab again to see documentation>
jshell> jshell> str.codePointCount(
int String.codePointCount(int beginIndex, int endIndex)
Returns the number of Unicode code points in the specified text range of this String .The text
range begins at the specified beginIndex and extends to the char at index endIndex - 1 . Thus
the length (in char s) of the text range is endIndex-beginIndex . Unpaired surrogates within
the text range count as one code point each.
Parameters:
beginIndex - the index to the first char of the text range.
endIndex - the index after the last char of the text range.
Returns:
the number of Unicode code points in the specified text range
Thrown Exceptions:
IndexOutOfBoundsException - if the beginIndex is negative, or endIndex is larger than the
length of this String , or beginIndex is larger than endIndex .
<press tab again to see all possible completions; total possible completions: 1,408>
- links to
-
Review(master) openjdk/jdk/26461