Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8327703

Allow NumberFormat strict parsing

    XMLWordPrintable

Details

    • CSR
    • Resolution: Approved
    • P4
    • 23
    • core-libs
    • None
    • behavioral
    • low
    • Changing the default parsing to strict would cause major compatibility concerns. To minimize compatibility risk, parsing should remain lenient by default; only if the user intentionally calls setStrict(), can they begin to parse strictly.
    • Java API
    • SE

    Description

      Summary

      Introduce optional strict parsing for the abstract class java.text.NumberFormat, which is implemented by the java.text.DecimalFormat and java.text.CompactNumberFormat concrete subclasses.

      Problem

      java.text.NumberFormat, by default, does not define API for the flexibility between strict and lenient parsing. Format classes in the JDK, such as java.time.format.DateTimeFormatter or java.text.DateFormat allow for both lenient and strict parsing.

      While NumberFormat's (lenient) parsing serves its own purpose, it cannot be used for input validation, which is a common and desired use case of parsing.

      For example, given the string "1,,,0,,,00,.23Zabced45", currently a US locale NumberFormat (in reality, the DecimalFormat subclass) will successfully parse the Number 1000.23 out of the String. However, many would have preferred an error occurred, as the String did not match the expected locale's numerical conventions.

      Solution

      Adapting Strict Parsing in the API

      Introduce java.text.NumberFormat optional strict parsing. This is intentionally made optional, as not all subclasses will differentiate between strict and lenient parsing, for example, the JDK's own ChoiceFormat subclass. This involves adding the public optional methods NumberFormat.setStrict(boolean) and NumberFormat.isStrict().

      This procedure is consistent with some other NumberFormat methods, that are optional as well, such as the NumberFormat.getCurrency() method. These NumberFormat optional methods throw UnsupportedOperationException as the default implementation . It is up to subclasses to decide if they want to implement strict/lenient parsing behavior, and if so, can override and implement NumberFormat.setStrict(boolean) and NumberFormat.isStrict().

      While a potential alternative involves implementing a leniency interface, it was decided against for the following reasons:

      1) Defining optional methods and throwing UnsupportedOperationException if not used is more consistent with the existing structure of NumberFormat's API

      2) DateFormat cannot implement the potential interface, as it already defines its own leniency related methods. Thus the use case of defining a whole new interface is small.

      With these changes, the specification of the existing parse methods are updated so that concepts of parsing/leniency are respective to the implementing class. For example, ChoiceFormat.parse(string, parsePosition) no longer mentions isParseintegerOnly(), as that method has no effect on ChoiceFormat parsing. DecimalFormat/CompactNumberFormat.parse(string, parsePosition) continue to mention isParseintegerOnly() in the method description, as the parsing implementation of these classes is affected by isParseintegerOnly().

      Strict Parsing Behavior

      Strict parsing allows in both DecimalFormat and CompactNumberFormat for grouping size to be respected, exact prefix and suffix matching, and the validation that non-expected characters do not allow parsing to succeed. This allows users to verify that the String adhered to format of the desired locale and will clearly fail if not. The exact rules that define strict parsing are outlined in the specification section.

      Usage is straightforward, for example,

      DecimalFormat fmt = (DecimalFormat) NumberFormat.getNumberInstance(Locale.US);
      fmt.parse("1,,,0,,,00,.23Zabced45");  // returns 1000.23
      fmt.setStrict(true);
      fmt.parse("1,,,0,,,00,.23Zabced45"); // Now throws a ParseException

      Specification

      Format

      Add to the Format class description,

      + * <p> Subclasses may also consider implementing leniency when parsing.
      + * The definition of leniency should be delegated to the subclass.

      Update the Format.parseObject(string, ParsePosition) wording,

           /**
      -     * Parses text from a string to produce an object.
      +     * Parses text from the given string to produce an object.
            * <p>
      -     * The method attempts to parse text starting at the index given by
      -     * {@code pos}.
      -     * If parsing succeeds, then the index of {@code pos} is updated
      +     * This method attempts to parse text starting at the index given by
      +     * {@code pos}. If parsing succeeds, then the index of {@code pos} is updated
            * to the index after the last character used (parsing does not necessarily
            * use all characters up to the end of the string), and the parsed
            * object is returned. The updated {@code pos} can be used to
            * indicate the starting point for the next call to this method.
            * If an error occurs, then the index of {@code pos} is not
            * changed, the error index of {@code pos} is set to the index of
      -     * the character where the error occurred, and null is returned.
      +     * the character where the error occurred, and {@code null} is returned.
            *
      -     * @param source A {@code String}, part of which should be parsed.
      +     * @param source the {@code String} to parse
            * @param pos A {@code ParsePosition} object with index and error
            *            index information as described above.
            * @return An {@code Object} parsed from the string. In case of
      -     *         error, returns null.
      -     * @throws NullPointerException if {@code source} or {@code pos} is null.
      +     *         error, returns {@code null}.
      +     * @throws NullPointerException if {@code source} or {@code pos} is
      +     *         {@code null}.
            */
           public abstract Object parseObject (String source, ParsePosition pos);

      Update the Format.parseObject(string) wording,

           /**
            * Parses text from the beginning of the given string to produce an object.
      -     * The method may not use the entire text of the given string.
      +     * This method may not use the entire text of the given string.
            *
      -     * @param source A {@code String} whose beginning should be parsed.
      +     * @param source A {@code String}, to be parsed from the beginning.
            * @return An {@code Object} parsed from the string.
      -     * @throws    ParseException if the beginning of the specified string
      -     *            cannot be parsed.
      -     * @throws NullPointerException if {@code source} is null.
      +     * @throws ParseException if parsing fails
      +     * @throws NullPointerException if {@code source} is {@code null}.
            */
           public Object parseObject(String source) throws ParseException {

      NumberFormat

      Add a leniency section in the java.text.NumberFormat class description,

      + * <h2><a id="leniency">Leniency</a></h2>
      + * {@code NumberFormat} by default, parses leniently. Subclasses may consider
      + * implementing strict parsing and as such, overriding and providing
      + * implementations for the optional {@link #isStrict()} and {@link
      + * #setStrict(boolean)} methods.
        * <p>
      + * Lenient parsing should be used when attempting to parse a number
      + * out of a String that contains non-numerical or non-format related values.
      + * For example, using a {@link Locale#US} currency format to parse the number
      + * {@code 1000} out of the String "$1,000.00 was paid".
      + * <p>
      + * Strict parsing should be used when attempting to ensure a String adheres exactly
      + * to a locale's conventions, and can thus serve to validate input. For example, successfully
      + * parsing the number {@code 1000.55} out of the String "1.000,55" confirms the String
      + * exactly adhered to the {@link Locale#GERMANY} numerical conventions.

      Add the methods, NumberFormat.isStrict() and setStrict()

      +    /**
      +     * {@return {@code true} if this format will parse numbers strictly;
      +     * {@code false} otherwise}
      +     *
      +     * @implSpec The default implementation always throws {@code
      +     * UnsupportedOperationException}. Subclasses should override this method
      +     * when implementing strict parsing.
      +     * @throws    UnsupportedOperationException if the implementation of this
      +     *            method does not support this operation
      +     * @see ##leniency Leniency Section
      +     * @see #setStrict(boolean)
      +     * @since 23
      +     */
      +    public boolean isStrict() {
      ...
      +    /**
      +     * Change the leniency value for parsing. Parsing can either be strict or lenient,
      +     * by default it is lenient.
      +     *
      +     * @implSpec The default implementation always throws {@code
      +     * UnsupportedOperationException}. Subclasses should override this method
      +     * when implementing strict parsing.
      +     * @param strict {@code true} if parsing should be done strictly;
      +     *               {@code false} otherwise
      +     * @throws    UnsupportedOperationException if the implementation of this
      +     *            method does not support this operation
      +     * @see ##leniency Leniency Section
      +     * @see #isStrict()
      +     * @since 23
      +     */
      +    public void setStrict(boolean strict) {

      Update NumberFormat.parseObject(String, ParsePosition)

           /**
      -     * Parses text from a string to produce a {@code Number}.
      -     * <p>
      -     * The method attempts to parse text starting at the index given by
      -     * {@code pos}.
      -     * If parsing succeeds, then the index of {@code pos} is updated
      -     * to the index after the last character used (parsing does not necessarily
      -     * use all characters up to the end of the string), and the parsed
      -     * number is returned. The updated {@code pos} can be used to
      -     * indicate the starting point for the next call to this method.
      -     * If an error occurs, then the index of {@code pos} is not
      -     * changed, the error index of {@code pos} is set to the index of
      -     * the character where the error occurred, and null is returned.
      -     * <p>
      -     * See the {@link #parse(String, ParsePosition)} method for more information
      -     * on number parsing.
      +     * {@inheritDoc Format}
            *
      -     * @param source A {@code String}, part of which should be parsed.
      +     * @implSpec This implementation is equivalent to calling {@code parse(source,
      +     *           pos)}.
      +     * @param source the {@code String} to parse
            * @param pos A {@code ParsePosition} object with index and error
            *            index information as described above.
            * @return A {@code Number} parsed from the string. In case of
            *         error, returns null.
            * @throws NullPointerException if {@code source} or {@code pos} is null.
            */
           @Override
           public final Object parseObject(String source, ParsePosition pos) {

      Update NumberFormat.parse(String, ParsePosition)

           /**
      -     * Returns a Long if possible (e.g., within the range [Long.MIN_VALUE,
      +     * Parses text from the beginning of the given string to produce a {@code Number}.
      +     * <p>
      +     * This method attempts to parse text starting at the index given by the
      +     * {@code ParsePosition}. If parsing succeeds, then the index of the {@code
      +     * ParsePosition} is updated to the index after the last character used
      +     * (parsing does not necessarily use all characters up to the end of the
      +     * string), and the parsed number is returned. The updated {@code
      +     * ParsePosition} can be used to indicate the starting
      +     * point for the next call to this method. If an error occurs, then the
      +     * index of the {@code ParsePosition} is not changed, the error index of the
      +     * {@code ParsePosition} is set to the index of the character where the error
      +     * occurred, and {@code null} is returned.
      +     * <p>
      +     * This method will return a Long if possible (e.g., within the range [Long.MIN_VALUE,
            * Long.MAX_VALUE] and with no decimals), otherwise a Double.
      -     * If IntegerOnly is set, will stop at a decimal
      -     * point (or equivalent; e.g., for rational numbers "1 2/3", will stop
      -     * after the 1).
      -     * Does not throw an exception; if no object can be parsed, index is
      -     * unchanged!
            *
      -     * @param source the String to parse
      -     * @param parsePosition the parse position
      -     * @return the parsed value
      -     * @see java.text.NumberFormat#isParseIntegerOnly
      -     * @see java.text.Format#parseObject
      +     * @param source the {@code String} to parse
      +     * @param parsePosition A {@code ParsePosition} object with index and error
      +     *            index information as described above.
      +     * @return A {@code Number} parsed from the string. In case of
      +     *         failure, returns {@code null}.
      +     * @throws NullPointerException if {@code source} or {@code ParsePosition}
      +     *         is {@code null}.
      +     * @see #isStrict()
            */
           public abstract Number parse(String source, ParsePosition parsePosition);

      Update NumberFormat.parse(String)

           /**
      -     * Parses text from the beginning of the given string to produce a number.
      -     * The method may not use the entire text of the given string.
      +     * Parses text from the beginning of the given string to produce a {@code Number}.
            * <p>
      -     * See the {@link #parse(String, ParsePosition)} method for more information
      -     * on number parsing.
      +     * This method will return a Long if possible (e.g., within the range [Long.MIN_VALUE,
      +     * Long.MAX_VALUE] and with no decimals), otherwise a Double.
            *
      -     * @param source A {@code String} whose beginning should be parsed.
      +     * @param source A {@code String}, to be parsed from the beginning.
            * @return A {@code Number} parsed from the string.
      -     * @throws    ParseException if the beginning of the specified string
      -     *            cannot be parsed.
      +     * @throws ParseException if parsing fails
      +     * @throws NullPointerException if {@code source} is {@code null}.
      +     * @see #isStrict()
            */
           public Number parse(String source) throws ParseException {

      DecimalFormat

      Update DecimalFormat.parse(String, ParsePosition)

      -     * Parses text from a string to produce a {@code Number}.
      +     * {@inheritDoc NumberFormat}
      +     * <p>
      +     * Parsing can be done in either a strict or lenient manner, by default it is lenient.
            * <p>
      -     * The method attempts to parse text starting at the index given by
      -     * {@code pos}.
      -     * If parsing succeeds, then the index of {@code pos} is updated
      -     * to the index after the last character used (parsing does not necessarily
      -     * use all characters up to the end of the string), and the parsed
      -     * number is returned. The updated {@code pos} can be used to
      -     * indicate the starting point for the next call to this method.
      -     * If an error occurs, then the index of {@code pos} is not
      -     * changed, the error index of {@code pos} is set to the index of
      -     * the character where the error occurred, and null is returned.
      +     * Parsing fails when <b>lenient</b>, if the prefix and/or suffix are non-empty
      +     * and cannot be found due to parsing ending early, or the first character
      +     * after the prefix cannot be parsed.
      +     * <p>
      +     * Parsing fails when <b>strict</b>, if in {@code text},
      +     * <ul>
      +     *   <li> The prefix is not found. For example, a {@code Locale.US} currency
      +     *   format prefix: "{@code $}"
      +     *   <li> The suffix is not found. For example, a {@code Locale.US} percent
      +     *   format suffix: "{@code %}"
      +     *   <li> {@link #isGroupingUsed()} returns {@code true}, and {@link
      +     *   #getGroupingSize()} is not adhered to
      +     *   <li> {@link #isGroupingUsed()} returns {@code false}, and the grouping
      +     *   symbol is found
      +     *   <li> {@link #isParseIntegerOnly()} returns {@code true}, and the decimal
      +     *   separator is found
      +     *   <li> {@link #isGroupingUsed()} returns {@code true} and {@link
      +     *   #isParseIntegerOnly()} returns {@code false}, and the grouping
      +     *   symbol occurs after the decimal separator
      +     *   <li> Any other characters are found, that are not the expected symbols,
      +     *   and are not digits that occur within the numerical portion
      +     * </ul>
            * <p>
            * The subclass returned depends on the value of {@link #isParseBigDecimal}
            * as well as on the string being parsed.
      ...
            * @return     the parsed value, or {@code null} if the parse fails
            * @throws     NullPointerException if {@code text} or
            *             {@code pos} is null.
            */
           @Override
           public Number parse(String text, ParsePosition pos) {

      Override and specify DecimalFormat.isStrict(),

      +    /**
      +     * {@inheritDoc NumberFormat}
      +     *
      +     * @see #setStrict(boolean)
      +     * @see #parse(String, ParsePosition)
      +     * @since 23
      +     */
      +    @Override
      +    public boolean isStrict() {

      Override and specify DecimalFormat.setStrict(boolean),

      +
      +    /**
      +     * {@inheritDoc NumberFormat}
      +     *
      +     * @see #isStrict()
      +     * @see #parse(String, ParsePosition)
      +     * @since 23
      +     */
      +    @Override
      +    public void setStrict(boolean strict) {

      CompactNumberFormat

      Update CompactNumberFormat.parse(String, ParsePosition)

           /**
      -     * Parses a compact number from a string to produce a {@code Number}.
      +     * {@inheritDoc NumberFormat}
            * <p>
      -     * The method attempts to parse text starting at the index given by
      -     * {@code pos}.
      -     * If parsing succeeds, then the index of {@code pos} is updated
      -     * to the index after the last character used (parsing does not necessarily
      -     * use all characters up to the end of the string), and the parsed
      -     * number is returned. The updated {@code pos} can be used to
      -     * indicate the starting point for the next call to this method.
      -     * If an error occurs, then the index of {@code pos} is not
      -     * changed, the error index of {@code pos} is set to the index of
      -     * the character where the error occurred, and {@code null} is returned.
      -     * <p>
      -     * The value is the numeric part in the given text multiplied
      +     * The returned value is the numeric part in the given text multiplied
            * by the numeric equivalent of the affix attached
            * (For example, "K" = 1000 in {@link java.util.Locale#US US locale}).
      +     * <p>
      +     * A {@code CompactNumberFormat} can match
      +     * the default prefix/suffix to a compact prefix/suffix interchangeably.
      +     * <p>
      +     * Parsing can be done in either a strict or lenient manner, by default it is lenient.
      +     * <p>
      +     * Parsing fails when <b>lenient</b>, if the prefix and/or suffix are non-empty
      +     * and cannot be found due to parsing ending early, or the first character
      +     * after the prefix cannot be parsed.
      +     * <p>
      +     * Parsing fails when <b>strict</b>, if in {@code text},
      +     * <ul>
      +     *   <li> The default or a compact prefix is not found. For example, the {@code
      +     *   Locale.US} currency format prefix: "{@code $}"
      +     *   <li> The default or a compact suffix is not found. For example, a {@code Locale.US}
      +     *   {@link NumberFormat.Style#SHORT} compact suffix: "{@code K}"
      +     *   <li> {@link #isGroupingUsed()} returns {@code false}, and the grouping
      +     *   symbol is found
      +     *   <li> {@link #isGroupingUsed()} returns {@code true}, and {@link
      +     *   #getGroupingSize()} is not adhered to
      +     *   <li> {@link #isParseIntegerOnly()} returns {@code true}, and the decimal
      +     *   separator is found
      +     *   <li> {@link #isGroupingUsed()} returns {@code true} and {@link
      +     *   #isParseIntegerOnly()} returns {@code false}, and the grouping
      +     *   symbol occurs after the decimal separator
      +     *   <li> Any other characters are found, that are not the expected symbols,
      +     *   and are not digits that occur within the numerical portion
      +     * </ul>
      +     * <p>
            * The subclass returned depends on the value of
            * {@link #isParseBigDecimal}.
            * <ul>
      ...
            * @return the parsed value, or {@code null} if the parse fails
            * @throws     NullPointerException if {@code text} or
            *             {@code pos} is null
      -     *
            */
           @Override
           public Number parse(String text, ParsePosition pos) {

      Override and specify CompactNumberFormat.isStrict(),

      +    /**
      +     * {@inheritDoc NumberFormat}
      +     *
      +     * @see #setStrict(boolean)
      +     * @see #parse(String, ParsePosition)
      +     * @since 23
      +     */
      +    @Override
      +    public boolean isStrict() {

      Override and specify CompactNumberFormat.setStrict(boolean),

      +    /**
      +     * {@inheritDoc NumberFormat}
      +     *
      +     * @see #isStrict()
      +     * @see #parse(String, ParsePosition)
      +     * @since 23
      +     */
      +    @Override
      +    public void setStrict(boolean strict) {

      The private field parseStrict is both added to DecimalFormat and CompactNumberFormat to implement the mentioned methods. Included as they both affect the serialized form of the classes.

      CompactNumberFormat,

      +    /**
      +     * True if this {@code CompactNumberFormat} will parse numbers with strict
      +     * leniency.
      +     *
      +     * @serial
      +     * @since 23
      +     */
      +    private boolean parseStrict = false;

      DecimalFormat,

      +    /**
      +     * True if this {@code DecimalFormat} will parse numbers with strict
      +     * leniency.
      +     *
      +     * @serial
      +     * @since 23
      +     */
      +    private boolean parseStrict = false;

      Attachments

        Issue Links

          Activity

            People

              jlu Justin Lu
              jlu Justin Lu
              Naoto Sato
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: