Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8321737

Issues with Umlauts when starting Java from Mac with SSH on Linux / Native Encoding

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Not an Issue
    • Icon: P4 P4
    • None
    • 17, 21
    • core-libs
    • None
    • Mac for SSH, Linux as runtime

       

      I've noticed that some servers of mine only work properly when started in a certain way.

      When the server misbehaves, umlauts are not supported in a certain way, which can be tested with the following snippet:

      Reproduce:

      new java.io.File("/folder/ümläüuts").toPath()

      And it creates the following exception:

      java.nio.file.InvalidPathException: Malformed input or input contains unmappable characters: /folder/??ml????uts
              at java.base/sun.nio.fs.UnixPath.encode(UnixPath.java:121)
              at java.base/sun.nio.fs.UnixPath.<init>(UnixPath.java:68)
              at java.base/sun.nio.fs.UnixFileSystem.getPath(UnixFileSystem.java:279)
              at java.base/java.io.File.toPath(File.java:2387)
              at com.jpro.internal.server.JProInitializer$.$anonfun$initialize$4(JProInitializer.scala:120)
              at com.jpro.internal.server.JProInitializer$.execute(JProInitializer.scala:70)
              at com.jpro.internal.server.JProInitializer$.initialize(JProInitializer.scala:118)
              at com.jpro.internal.server.JProInitializer$.init(JProInitializer.scala:32)
              at com.jpro.internal.server.JProStarter$.$anonfun$startFromBoot$2(JProStarter.scala:34)
              at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)


      Effects:
       - Certain Filenames cannot be handled
       - File Upload on Server fails in production (but works during development on Mac)
       - Other unexpected effects



      How does it happen?
      This happens because of the following environment variable:
       LC_CTYPE=UTF-8
      This gets "inherited" from the Mac Terminal, but I think it should instead be "C.UTF-8"

      Possible workaround are:

       1. Setting to C.UTF-8
          export LC_CTYPE=C.UTF-8
       2. Unsetting LC_CTYPE
          unset LC_CTYPE
       3. Setting LC_ALL
          export LC_CTYPE=C.UTF-8
       4. SSH first to the machine and then use the command instead of providing the commandline directly in the commandline.
          Fails:
           > ssh ..... user@machine "java <your-command>"
          Works:
           > ssh ..... user@machine
           > > java <your-command>

       What does not work:
        1. Setting the JVM encoding with parameters like -Dfile.encoding=UTF-8 or -Dsun.jnu.encoding=UTF-8 doesn't make a difference.


      For now, I've added a check to my Product to verify umlauts work properly, so failing configurations fail hard, instead of having a nearly unnoticeable error. (Currently, only file upload fails on misconfiguration)


      I think what happens is that JVM internally sets an "internal encoding for native code" based on these variables.

      Of course, this is caused by SSH changing env variables, but I still think it should not break the behavior of the JVM.

            naoto Naoto Sato
            fkirmaier Florian Kirmaier
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: