Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8162518

(fs) Paths.get(URI) on Unix doesn't allow non-ASCII chars

XMLWordPrintable

    • x86
    • linux_ubuntu, os_x

      FULL PRODUCT VERSION :
      java version "1.8.0_91"
      Java(TM) SE Runtime Environment (build 1.8.0_91-b14)
      Java HotSpot(TM) 64-Bit Server VM (build 25.91-b14, mixed mode)

      ADDITIONAL OS VERSION INFORMATION :
      MacOSX - 15.5.0 Darwin Kernel Version 15.5.0: Tue Apr 19 18:36:36 PDT 2016; root:xnu-3248.50.21~8/RELEASE_X86_64 x86_64

      A DESCRIPTION OF THE PROBLEM :
      If you have a file URI with a non-ASCII char and I try to make a Path using a call to Paths.getPath(uri) then any non-ASCII chars aren't handled correctly.

      For example...

      URI uri = new URI("file:///this/is/español/test.txt");
      Path path = Paths.get(uri);
      System.out.println("path toString() = " + path);

      This prints "/this/is/espa�ol/test.txt" on Mac, but works on Windows. From that point on there is no way to get the original path. Calling...

      System.out.println("path getName(2) = " + path.getName(2));

      ...also prints "espa�ol" and calling...

      path.getName(2).toString().equals("español");

      ...returns true on Windows but false on Mac.


      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      String uriStr = "file:///this/is/español/test.txt";
      System.out.println("URI String = " + uriStr);

      URI uri = new URI(uriStr);
      System.out.println("URI toString() = " + uri); // this is fine

      Path path = Paths.get(uri);
      // prints unknown char for ñ on Mac, fine on Windows
      System.out.println("path toString() = " + path);

      // prints unknown char for ñ on Mac, fine on Windows
      System.out.println("path getName(2) = " + path.getName(2)); // fails on Mac

      System.out.println(path.getName(2).toString().equals("español"));

      // prints incorrect encoding of URI on Mac, correctly shows "español" on Windows
      System.out.println("path URI toString() = " + path.toUri());


      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      // there are the correct results from Windows

      URI String = file:///this/is/español/test.txt
      URI toString() = file:///this/is/español/test.txt
      path toString() = /this/is/español/test.txt
      path getName(2) = español
      true
      path URI toString() = file:///this/is/español/test.txt

      ACTUAL -
      URI String = file:///this/is/español/test.txt
      URI toString() = file:///this/is/español/test.txt
      path toString() = /this/is/espa�ol/test.txt
      path getName(2) = espa�ol
      false
      path URI toString() = file:///this/is/espa%F1ol/test.txt

      REPRODUCIBILITY :
      This bug can be reproduced always.

      ---------- BEGIN SOURCE ----------
      @Test
      public void showURI2Path2StringResults() throws URISyntaxException
          {
              String uriStr = "file:///this/is/español/test.txt";
              System.out.println("URI String = " + uriStr);

              URI uri = new URI(uriStr);
              System.out.println("URI toString() = " + uri); // this is fine

              Path path = Paths.get(uri);
              // prints unknown char for ñ on Mac, fine on Windows
              System.out.println("path toString() = " + path);

              // prints unknown char for ñ on Mac, fine on Windows
              System.out.println("path getName(2) = " + path.getName(2)); // fails on Mac
              
              System.out.println(path.getName(2).toString().equals("español"));
        
              // prints incorrect encoding of URI on Mac, correctly shows "español" on Windows
              System.out.println("path URI toString() = " + path.toUri());
          }

      ---------- END SOURCE ----------

            bpb Brian Burkhalter
            webbuggrp Webbug Group
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: