-
Type:
Enhancement
-
Resolution: Unresolved
-
Priority:
P4
-
None
-
Affects Version/s: None
-
Component/s: hotspot
-
generic
-
generic
A DESCRIPTION OF THE PROBLEM :
Issue: Inefficient Separator Check Leading to Quadratic Complexity
Location: src/jdk.jfr/share/classes/jdk/jfr/internal/util/Tokenizer.java
The performance bottleneck originates in the private boolean isSeparator(char c) method. This method determines if a character is a separator by iterating through the separators character array one element at a time. If the array has a length of K_sep, each call to isSeparator() takes O(K_sep) time in the worst case (when the character is not a separator).
Methods like next() call isSeparator() repeatedly, once for each character in the input text until a token is formed. For an input text of length N_text, the cumulative time complexity for tokenizing the text becomes O(N_text * K_sep).
This design becomes a significant performance issue when an application uses the Tokenizer with a large number of separators and processes a long string. If an attacker or user can control both the input text and the separator list, they can construct a worst-case scenario. For example, providing a long text (N_text) composed of characters that are not in a long separator list (K_sep) forces the isSeparator() method to complete a full scan for almost every character in the text.
When N_text and K_sep are proportional, this behavior degrades to O(N^2). The provided proof-of-concept (dostest.java) effectively demonstrates this. It creates a long string of character 'a' and a long separator list of character 'b'. The test results show that as the size (N) of the text and separator list doubles, the execution time increases by approximately a factor of four, confirming the quadratic performance degradation.
Issue: Inefficient Separator Check Leading to Quadratic Complexity
Location: src/jdk.jfr/share/classes/jdk/jfr/internal/util/Tokenizer.java
The performance bottleneck originates in the private boolean isSeparator(char c) method. This method determines if a character is a separator by iterating through the separators character array one element at a time. If the array has a length of K_sep, each call to isSeparator() takes O(K_sep) time in the worst case (when the character is not a separator).
Methods like next() call isSeparator() repeatedly, once for each character in the input text until a token is formed. For an input text of length N_text, the cumulative time complexity for tokenizing the text becomes O(N_text * K_sep).
This design becomes a significant performance issue when an application uses the Tokenizer with a large number of separators and processes a long string. If an attacker or user can control both the input text and the separator list, they can construct a worst-case scenario. For example, providing a long text (N_text) composed of characters that are not in a long separator list (K_sep) forces the isSeparator() method to complete a full scan for almost every character in the text.
When N_text and K_sep are proportional, this behavior degrades to O(N^2). The provided proof-of-concept (dostest.java) effectively demonstrates this. It creates a long string of character 'a' and a long separator list of character 'b'. The test results show that as the size (N) of the text and separator list doubles, the execution time increases by approximately a factor of four, confirming the quadratic performance degradation.