An analysis of the `parse*()` family of methods in `j.l.Integer` and `j.l.Long` shows the following shortcomings:
* The code in `parse*(CharSequence,int,int,int)` might re-read the initial character twice. While the specification is clear that concurrent modifications are not guarded against, it's almost trivial to ensure that each character is read at most once, from lower to higher indices. With this provision, the implementation can ensure that the input is parsed _as if_ it was first copied to a private, unshared buffer which is then processed. The copying activity can still be subject to concurrency hazards, but the overall semantics is clearer: once the character at a position is read, it does not appear to change anymore.
* The code in the `parseUnsignedInt()` methods currently rely on both `parseInt()` and `parseLong()`. When using the latter, the input is re-read once again. Not only is this a waste, but for `CharSequence` inputs it is more hazardous than needed (see above).
* The code in `parseUnsignedLong()` has a special path with a long comment that requires some sophistication to be understood. The code is fundamentally different than the `parseUnsignedInt()` counterparts.
This enhancement proposes to
* Unify all methods giving them the same overall structure, similar to the one for the signed cases. That is, the same code structure is used for the unsigned cases as well, eliminating the need of studying another algorithm.
* Ensure that each character subject to parsing is read at most once, from lower to higher indices ("as-if-copy" semantics).
* Remove the dependency on `parseLong()` in the `parseUnsignedInt()` methods.
* Remove the special path in `parseUnsignedLong()`.
The proposed implementation is even slightly faster (JMH speedups 1.0x-1.4x). No performance regression have been observed. More importantly, the overall code structure is always the same, with small adaptations for the specific cases.
* The code in `parse*(CharSequence,int,int,int)` might re-read the initial character twice. While the specification is clear that concurrent modifications are not guarded against, it's almost trivial to ensure that each character is read at most once, from lower to higher indices. With this provision, the implementation can ensure that the input is parsed _as if_ it was first copied to a private, unshared buffer which is then processed. The copying activity can still be subject to concurrency hazards, but the overall semantics is clearer: once the character at a position is read, it does not appear to change anymore.
* The code in the `parseUnsignedInt()` methods currently rely on both `parseInt()` and `parseLong()`. When using the latter, the input is re-read once again. Not only is this a waste, but for `CharSequence` inputs it is more hazardous than needed (see above).
* The code in `parseUnsignedLong()` has a special path with a long comment that requires some sophistication to be understood. The code is fundamentally different than the `parseUnsignedInt()` counterparts.
This enhancement proposes to
* Unify all methods giving them the same overall structure, similar to the one for the signed cases. That is, the same code structure is used for the unsigned cases as well, eliminating the need of studying another algorithm.
* Ensure that each character subject to parsing is read at most once, from lower to higher indices ("as-if-copy" semantics).
* Remove the dependency on `parseLong()` in the `parseUnsignedInt()` methods.
* Remove the special path in `parseUnsignedLong()`.
The proposed implementation is even slightly faster (JMH speedups 1.0x-1.4x). No performance regression have been observed. More importantly, the overall code structure is always the same, with small adaptations for the specific cases.
- relates to
-
JDK-8318646 Integer#parseInt("") throws empty NumberFormatException message
- Resolved