Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8364132

Lenient parsing of minus sign pattern in DecimalFormat/CompactNumberFormat

XMLWordPrintable

    • Icon: CSR CSR
    • Resolution: Unresolved
    • Icon: P4 P4
    • 26
    • core-libs
    • None
    • behavioral
    • low
    • Hide
      Source texts with lenient characters prepended to numbers may now be parsed without throwing an exception. `DecimalFormat` and `CompactNumberFormat` will accept lenient minus signs when isStrict() returns false. However, I believe such cases are quite rare and would have previously resulted in an exception before this fix.
      Show
      Source texts with lenient characters prepended to numbers may now be parsed without throwing an exception. `DecimalFormat` and `CompactNumberFormat` will accept lenient minus signs when isStrict() returns false. However, I believe such cases are quite rare and would have previously resulted in an exception before this fix.
    • Java API
    • SE

      Summary

      Enable loose matching of dash and minusSign in DecimalFormat and CompactNumberFormat number parsing.

      Problem

      Parsing minus signs in numbers can be problematic due to the presence of multiple visually similar characters. This becomes especially troublesome in locales where the default minus sign is not the U+002D HYPHEN-MINUS, for example, in the Finnish locale. In such cases, users typically enter the hyphen-minus from their keyboard, but it may not be recognized as a valid minus sign during number parsing.

      Solution

      CLDR defines the concept of parse leniency. Applying this leniency to minus signs addresses the issue described above. NumberFormat now allows implementations to parse the minus sign in negative patterns leniently when in lenient mode (NumberFormat.isStrict() returns true). The concrete classes, DecimalFormat and CompactNumberFormat both use CLDR’s parseLenient data to support lenient parsing of minus signs, and this behavior is enabled by default. To disable lenient parsing, call NumberFormat.setStrict(true).

      Specification

      Add the following description in the class description of java.text.NumberFormat.

      --- a/src/java.base/share/classes/java/text/NumberFormat.java
      +++ b/src/java.base/share/classes/java/text/NumberFormat.java
      @@ -192,7 +192,10 @@
        * Lenient parsing should be used when attempting to parse a number
        * out of a String that contains non-numerical or non-format related values.
        * For example, using a {@link Locale#US} currency format to parse the number
      - * {@code 1000} out of the String "$1,000.00 was paid".
      + * {@code 1000} out of the String "$1,000.00 was paid". Lenient parsing also
      + * allows loose matching of characters in the source text. For example, an
      + * implementation of the {@code NumberFormat} class may allow matching "−"
      + * (U+2212 MINUS SIGN) to the "-" (U+002D HYPHEN-MINUS) pattern character
        * <p>
        * Strict parsing should be used when attempting to ensure a String adheres exactly
        * to a locale's conventions, and can thus serve to validate input. For example, successfully
      

      Add the HTML anchor to the title of Negative Subpatterns sections in the class descriptions of java.text.CompactNumberFormat and java.text.DecimalFormat.

      - * <h3>Negative Subpatterns</h3>
      + * <h3><a id="negative_subpatterns">Negative Subpatterns</a></h3>

      Append the following description to the first paragraph of Negative Subpatterns sections in the class descriptions of java.text.CompactNumberFormat and java.text.DecimalFormat.

      + * In
      + * {@link NumberFormat##leniency lenient parsing} mode, loose matching of the
      + * minus sign pattern is enabled, following the LDML’s
      + * <a href="https://unicode.org/reports/tr35/#Loose_Matching">
      + * loose matching</a> specification.

      Add the following paragraph to the method description of parse() in java.text.CompactNumberFormat and java.text.DecimalFormat, just after "strict" failing conditions.

      +     * When lenient, the minus sign in the {@link ##negative_subpatterns
      +     * negative subpatterns} is loosely matched against lenient minus sign characters.

            naoto Naoto Sato
            naoto Naoto Sato
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: