-
Enhancement
-
Resolution: Fixed
-
P3
-
8, 9
-
b145
-
generic
-
generic
A DESCRIPTION OF THE REQUEST :
While profiling script editor in NetBeans I've noticed that one of performance hot-spots comes from Nashorn parser - particularly 2 methods:
http://hg.openjdk.java.net/jdk9/dev/nashorn/file/a46b7d386795/src/jdk.scripting.nashorn/share/classes/jdk/nashorn/internal/parser/Lexer.java#l386
(code is the same for both Jdk8 and Jdk9)
public static boolean isJSWhitespace(final char ch) {
return JAVASCRIPT_WHITESPACE.indexOf(ch) != -1;
}
public static boolean isJSEOL(final char ch) {
return JAVASCRIPT_WHITESPACE_EOL.indexOf(ch) != -1;
}
These methods are called very frequently, but must typical check (ch == ' ') actually goes through rather complex String.indexOf()
public int indexOf(int ch, int fromIndex) {
final int max = value.length;
if (fromIndex < 0) {
fromIndex = 0;
} else if (fromIndex >= max) {
// Note: fromIndex might be near -1>>>1.
return -1;
}
if (ch < Character.MIN_SUPPLEMENTARY_CODE_POINT) {
// handle most cases here (ch is a BMP code point or a
// negative value (invalid code point))
final char[] value = this.value;
for (int i = fromIndex; i < max; i++) {
if (value[i] == ch) {
return i;
}
}
return -1;
} else {
return indexOfSupplementary(ch, fromIndex);
}
}
JUSTIFICATION :
To improve performance of Nashorn JavaScript parser
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
I expect following will be more correct in terms of perfrmance
public static boolean isJSWhitespace(final char ch) {
return ch == ' ' || ch == '\t' || ch == '\n' || ch == '\r'
|| JAVASCRIPT_OTHER_WHITESPACE.indexOf(ch) != -1;
}
public static boolean isJSEOL(final char ch) {
return ch == '\n' || ch == '\r'
|| ch == '\u2028' // line separator
|| ch == '\u2029' // paragraph separator
;
}
PS: i'm actually not sure that all Unicode characters mentioned in JAVASCRIPT_WHITESPACE make sense. Most of them have sense only for text processors - like whitespaces of different typographic width. JavaScript source file is typically a plain text, with special character occurring only within string literals, but there is no whitespace withing string literals.
While profiling script editor in NetBeans I've noticed that one of performance hot-spots comes from Nashorn parser - particularly 2 methods:
http://hg.openjdk.java.net/jdk9/dev/nashorn/file/a46b7d386795/src/jdk.scripting.nashorn/share/classes/jdk/nashorn/internal/parser/Lexer.java#l386
(code is the same for both Jdk8 and Jdk9)
public static boolean isJSWhitespace(final char ch) {
return JAVASCRIPT_WHITESPACE.indexOf(ch) != -1;
}
public static boolean isJSEOL(final char ch) {
return JAVASCRIPT_WHITESPACE_EOL.indexOf(ch) != -1;
}
These methods are called very frequently, but must typical check (ch == ' ') actually goes through rather complex String.indexOf()
public int indexOf(int ch, int fromIndex) {
final int max = value.length;
if (fromIndex < 0) {
fromIndex = 0;
} else if (fromIndex >= max) {
// Note: fromIndex might be near -1>>>1.
return -1;
}
if (ch < Character.MIN_SUPPLEMENTARY_CODE_POINT) {
// handle most cases here (ch is a BMP code point or a
// negative value (invalid code point))
final char[] value = this.value;
for (int i = fromIndex; i < max; i++) {
if (value[i] == ch) {
return i;
}
}
return -1;
} else {
return indexOfSupplementary(ch, fromIndex);
}
}
JUSTIFICATION :
To improve performance of Nashorn JavaScript parser
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
I expect following will be more correct in terms of perfrmance
public static boolean isJSWhitespace(final char ch) {
return ch == ' ' || ch == '\t' || ch == '\n' || ch == '\r'
|| JAVASCRIPT_OTHER_WHITESPACE.indexOf(ch) != -1;
}
public static boolean isJSEOL(final char ch) {
return ch == '\n' || ch == '\r'
|| ch == '\u2028' // line separator
|| ch == '\u2029' // paragraph separator
;
}
PS: i'm actually not sure that all Unicode characters mentioned in JAVASCRIPT_WHITESPACE make sense. Most of them have sense only for text processors - like whitespaces of different typographic width. JavaScript source file is typically a plain text, with special character occurring only within string literals, but there is no whitespace withing string literals.