Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8337509

Disable "best-fit" mapping on Windows command line

XMLWordPrintable

    • Icon: CSR CSR
    • Resolution: Approved
    • Icon: P4 P4
    • 24
    • tools
    • None
    • behavioral
    • low
    • Applications that relies on the "best-fit" mapping will be affected. However, expecting that behavior is inherently incorrect, thus I would expect it would be rare. A workaround is to use UTF-8 on Windows, which can be set in its settings.
    • add/remove/modify command line option
    • Implementation

      Summary

      Disable "best-fit" mapping on Java launcher's command line arguments on Windows

      Problem

      Java launcher on Windows uses GetCommandLineA() Win32 API to retrieve command arguments. If the arguments to the launcher include characters that cannot be mapped to the ANSI code page underneath, Windows uses "best-fit" approach if possible (https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-ucoderef/d1980631-6401-428e-a49d-d71394be7da8). This is problematic for some characters, such as quotation marks, as FULLWIDTH QUOTATION MARK (U+FF02) may be converted to double quotes in ascii (U+0022). This leads to an unexpected argument parsing.

      Solution

      Disable "best-fit" mapping for ummappable characters in the Java launcher's command line, replacing them with the default replacement characters.

      This comes with a small risk when applications rely on the best-fit behavior. For example, an argument string to the launcher

      "He said, ”Hello World!” to the world."

      Here, double quotes around Hello World! are FULLWIDTH QUOTATION MARK (U+FF02), so if the Windows ANSI code page is set to the one that does not have them in its character set, say Cp1252, then with this new change the argument is passed as one single text with the replacement characters, i.e.,

      He said, ?Hello World!? to the world.

      whereas it used to be split into two arguments before, as those fullwidth double quotes are replaced with ASCII double quotes, thus the space between Hello and World works as an argument delimiter.

      He said, Hello
      World! to the world.

      It seems very odd if applications would expect the split, but behavior like this will change with this modification.

      Specification

      N/A. This is a behavioral change only.

            naoto Naoto Sato
            naoto Naoto Sato
            Alan Bateman
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: