Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8324990

Loose matching of space separators in the lenient date/time parsing mode

XMLWordPrintable

    • Icon: CSR CSR
    • Resolution: Approved
    • Icon: P4 P4
    • 23
    • core-libs
    • None
    • behavioral
    • low
    • Hide
      Although it would not be common, applications that distinguish ASCII space and other space separators will see behavioral changes in the `lenient` mode. Even in that case, it can be avoided by choosing the `strict` mode, which retains the original behavior.
      Show
      Although it would not be common, applications that distinguish ASCII space and other space separators will see behavioral changes in the `lenient` mode. Even in that case, it can be avoided by choosing the `strict` mode, which retains the original behavior.
    • Java API
    • SE

      Summary

      Allow loose matching of space separators for both java.time.format and java.text date/time formatters in the lenient parsing mode.

      Problem

      JDK20 upgraded the CLDR version to 42 in which they replaced ASCII spaces (U+0020) between time and the am/pm marker with NNBSP (Narrow No-Break Space, U+202F) in English locales. Thus the localized parsers will throw an exception on parsing in-between ASCII spaces in the input text. This change broke some applications, although this is the expected behavior (JDK-8304925). Since NNBSP cannot be distinguished visually nor can be input easily it is not practical for applications to require input of NNBSP (JDK-8324308). To work around this issue, JDK's parsers should loosen the parsing.

      Solution

      The CLDR spec suggests Loose Matching of characters in Zs category (Character.SPACE_SEPARATOR) so that the differences between ASCII spaces and other space separators, including NNBSP, may be ignored. Since both date/time parsers in java.time.format and java.text have the concept of the lenient parsing, those parsers can parse all space separators equally in their lenient parsing mode.

      In java.time.format package, the default parsing mode is strict, thus applications will need to explicitly set the leniency by calling DateTimeFormatterBuilder.parseLenient(), such as:

          var dtf = new DateTimeFormatterBuilder()
              .parseLenient()
              .append(DateTimeFormatter.ofLocalizedTime(FormatStyle.SHORT))
              .toFormatter(Locale.ENGLISH);

      In java.text package, the default parsing mode is lenient, thus applications will be able to parse all space separators automatically (thus behavior changes by default). In the cases they need to strictly parse the text, they can do:

          var df = DateFormat.getTimeInstance(DateFormat.SHORT, Locale.ENGLISH);
          df.setLenient(false);

      Specification

      In java.time.formatter.DateTimeFormatterBuilder.parseLenient(), add the following:

      +     * @implSpec A {@link Character#SPACE_SEPARATOR SPACE_SEPARATOR} in the input
      +     * text will match any other {@link Character#SPACE_SEPARATOR SPACE_SEPARATOR}s
      +     * in the pattern with the lenient parse style.

      In java.time.formatter.DateTimeFormatterBuilder.parseStrict(), add the following:

      +     * @implSpec A {@link Character#SPACE_SEPARATOR SPACE_SEPARATOR} in the input
      +     * text will not match any other {@link Character#SPACE_SEPARATOR SPACE_SEPARATOR}s
      +     * in the pattern with the strict parse style.

      In java.text.DateFormat.setLenient(boolean), add the following:

      +     * @implSpec A {@link Character#SPACE_SEPARATOR SPACE_SEPARATOR} in the input
      +     * text will match any other {@link Character#SPACE_SEPARATOR SPACE_SEPARATOR}s
      +     * in the pattern with lenient parsing; otherwise, it will not match.

            naoto Naoto Sato
            naoto Naoto Sato
            Joe Wang, Roger Riggs
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: