Loading...

XML

Word

Printable

Type: Enhancement
Resolution: Fixed
Priority: P3
Fix Version/s: 9
Affects Version/s: 8, 9
Component/s: core-libs
Labels:

Subcomponent:
jdk.nashorn
Resolved In Build:
b145
CPU:

generic
OS:

generic

A DESCRIPTION OF THE REQUEST :
While profiling script editor in NetBeans I've noticed that one of performance hot-spots comes from Nashorn parser - particularly 2 methods:

http://hg.openjdk.java.net/jdk9/dev/nashorn/file/a46b7d386795/src/jdk.scripting.nashorn/share/classes/jdk/nashorn/internal/parser/Lexer.java#l386
(code is the same for both Jdk8 and Jdk9)

    public static boolean isJSWhitespace(final char ch) {
        return JAVASCRIPT_WHITESPACE.indexOf(ch) != -1;
    }

    public static boolean isJSEOL(final char ch) {
        return JAVASCRIPT_WHITESPACE_EOL.indexOf(ch) != -1;
    }

These methods are called very frequently, but must typical check (ch == ' ') actually goes through rather complex String.indexOf()

    public int indexOf(int ch, int fromIndex) {
        final int max = value.length;
        if (fromIndex < 0) {
            fromIndex = 0;
        } else if (fromIndex >= max) {
            // Note: fromIndex might be near -1>>>1.
            return -1;
        }

        if (ch < Character.MIN_SUPPLEMENTARY_CODE_POINT) {
            // handle most cases here (ch is a BMP code point or a
            // negative value (invalid code point))
            final char[] value = this.value;
            for (int i = fromIndex; i < max; i++) {
                if (value[i] == ch) {
                    return i;
                }
            }
            return -1;
        } else {
            return indexOfSupplementary(ch, fromIndex);
        }
    }

JUSTIFICATION :
To improve performance of Nashorn JavaScript parser

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
I expect following will be more correct in terms of perfrmance

    public static boolean isJSWhitespace(final char ch) {
        return ch == ' ' || ch == '\t' || ch == '\n' || ch == '\r'
            || JAVASCRIPT_OTHER_WHITESPACE.indexOf(ch) != -1;
    }

    public static boolean isJSEOL(final char ch) {
        return ch == '\n' || ch == '\r'
            || ch == '\u2028' // line separator
            || ch == '\u2029' // paragraph separator
            ;
    }

PS: i'm actually not sure that all Unicode characters mentioned in JAVASCRIPT_WHITESPACE make sense. Most of them have sense only for text processors - like whitespaces of different typographic width. JavaScript source file is typically a plain text, with special character occurring only within string literals, but there is no whitespace withing string literals.

Assignee:: Hannes Wallnoefer

Reporter:: Webbug Group

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2016-10-16 01:26

Updated:: 2016-11-17 10:33

Resolved:: 2016-11-11 09:58

Details

Description

Attachments

Activity

People

Dates