Loading...

XML

Word

Printable

Type: Enhancement
Resolution: Fixed
Priority: P4
Fix Version/s: 9
Affects Version/s: None
Component/s: core-libs
Labels:
None

Resolved In Build:
b119
CPU:

generic
OS:

generic

(1) pull out the "broken" printNodeTree (for debugging) from the Pattern. This one does not work as expected for a while . To replace the printNoteTree with the working one and putting it at a separate class j.u.regex.PrintPattern, which now can print out the clean and complete node tree of the pattern. For example,

   Pattern: [a-z0-9]+|ABCDEFG
     0: <Start>
     1: <Branch>
     2: <CharPropertyGreedy +>
     3: <Union>
     4: <Range[a-z]>
     5: <Range[0-9]>
         <-branch.separator->
     6: <Slice "ABCDEFG">
     7: </Branch>
     8: <END>

(2) the optimization for the greedy repetition of a "CharProperty", which parse the greedy repetition on a single "CharProperty", such as \p{IsGreek}+, or the most commonly used .* into a single/smooth loop node.

from

    Pattern: \p{IsGreek}+
     0: <Start>
     1: <Curly GREEDY + >
     2: <Script GREEK>
         </Curly>
     3: <END>

to

     Pattern: \p{IsGreek}+
     0: <Start>
     1: <CharPropertyGreedy Script GREEK+>
     2: <END>

   The simple jmh benchmark [2] indicates it is about 50%+, especially for those no-match case.

(3) the optimization for the "union" of various individual "char" inside a chracter class [...], usch as. [ABCDEF]. For a regex like [a-zABCDEF], now the engine generates the nodes like

   Pattern: [a-zABCDEF]
     0: <Start>
     1: <Union>
     2: <Union>
     3: <Union>
     4: <Union>
     5: <Union>
     6: <Union>
     7: <Range[a-z]>
     8: <Bits [ A B C D E F]>
     8: <Bits [ A B C D E F]>
     8: <Bits [ A B C D E F]>
     8: <Bits [ A B C D E F]>
     8: <Bits [ A B C D E F]>
     8: <Bits [ A B C D E F]>
     9: <END>

with the optimization it generate (which it should)

   Pattern: [a-zABCDEF]
     0: <Start>
     1: <Union>
     2: <Range[a-z]>
     3: <Bits [ A B C D E F]>
     4: <END>

   The jmh benchmark [2] also indicates it is much faster, especially for those no-match case.

(4) Replace those "constant" CharProperty nodes with a simple function interface/lambda. The change reduces the total package classes (anonymous classes) from 130+ to < 70.

oh, there is another one
(5) fix the change for the "j.u.regex: Negated Character Classes" [3]

[1] http://mail.openjdk.java.net/pipermail/core-libs-dev/2016-March/039269.html
[2] http://cr.openjdk.java.net/~sherman/regexClosure/MyBenchmark.java
[3] http://mail.openjdk.java.net/pipermail/core-libs-dev/2011-June/006957.html

relates to

JDK-8180450 secondary_super_cache does not scale well

Resolved

JDK-8196765 regex pattern working for JDK 8 does not work in JDK 9

Closed

Assignee:: Xueming Shen
Reporter:: Xueming Shen
Votes:: 0 Vote for this issue
Watchers:: 2 Start watching this issue

Created:: 2016-03-08 22:42
Updated:: 2022-11-02 06:43
Resolved:: 2016-05-10 21:23

Details

Description

Attachments

Issue Links

Activity

People

Dates