Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8156118

File class reports NFD normalised filenames stored on network storage as NFC

XMLWordPrintable

    • generic
    • generic

      FULL PRODUCT VERSION :
      java version "1.7.0_79"
      Java(TM) SE Runtime Environment (build 1.7.0_79-b15)
      Java HotSpot(TM) 64-Bit Server VM (build 24.79-b02, mixed mode)


      ADDITIONAL OS VERSION INFORMATION :
      Darwin Dans-Mac-mini.local 13.4.0 Darwin Kernel Version 13.4.0: Wed Mar 18 16:20:14 PDT 2015; root:xnu-2422.115.14~1/RELEASE_X86_64 x86_64

      EXTRA RELEVANT SYSTEM CONFIGURATION :
      Connected to a NAS over Samba

      A DESCRIPTION OF THE PROBLEM :
      When using listFiles() to list files, if a file is named using NFD unicode normalisation, it is reported as NFC instead.

      This has the effect, on non-HFS filesystems (such as one on network attached storage) of meaning the returned files do not exist() nor can they be opened for reading.

      Here's a particular file that I am trying to use:

      $ ls /Volumes/music.withoutart/2422/Johann_Sebastian_Bach/Orgelwerke_\(Karl_Richter\)_\(cd_3\)/06_Choral\,_BWV_768_Sei_Gegrüßet\,_Jesu_Gütig.flac | xxd
      0000000: 2f56 6f6c 756d 6573 2f6d 7573 6963 2e77 /Volumes/music.w
      0000010: 6974 686f 7574 6172 742f 3234 3232 2f4a ithoutart/2422/J
      0000020: 6f68 616e 6e5f 5365 6261 7374 6961 6e5f ohann_Sebastian_
      0000030: 4261 6368 2f4f 7267 656c 7765 726b 655f Bach/Orgelwerke_
      0000040: 284b 6172 6c5f 5269 6368 7465 7229 5f28 (Karl_Richter)_(
      0000050: 6364 5f33 292f 3036 5f43 686f 7261 6c2c cd_3)/06_Choral,
      0000060: 5f42 5756 5f37 3638 5f53 6569 5f47 6567 _BWV_768_Sei_Geg
      0000070: 7275 cc88 c39f 6574 2c5f 4a65 7375 5f47 ru....et,_Jesu_G
      0000080: 75cc 8874 6967 2e66 6c61 630a u..tig.flac.

      Note the NFD version of ü I have used in the ls command. This returns "cc88 c39f" as the hex encoded unicode value.

      Now I copy and paste what I receive from listFiles():

      $ printf "06-Choral,_BWV_768_Sei_Gegrüßet,_Jesu_Gütig.flac" | xxd
      0000000: 3036 2d43 686f 7261 6c2c 5f42 5756 5f37 06-Choral,_BWV_7
      0000010: 3638 5f53 6569 5f47 6567 72c3 bcc3 9f65 68_Sei_Gegr....e
      0000020: 742c 5f4a 6573 755f 47c3 bc74 6967 2e66 t,_Jesu_G..tig.f
      0000030: 6c61 63 lac

      Here, the hex encoded value is "c3bc c39f". Thus, incorrect, and the file cannot be read.

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      1) Create an NFD file (hopefully this POSTs correctly) on a drive connected via network storage (e.g. via samba, nfs, whatever):

      mkdir testDir
      touch testDir/ü

      Put this inside a "nfdtest" folder and expose it for sharing via samba.

      2) Run the following simple program to see it [not] work:

      import java.io.File;

      public class NfdTest {

      public static void main(String[] args) {
      File testDir = new File("/Volumes/nfdtest/testDir");
      File nfdEncodedFile = testDir.listFiles()[0];
      System.out.println("Exists: " + nfdEncodedFile.exists());
      }
      }

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      The program should output "true".
      ACTUAL -
      The program will output "false".

      REPRODUCIBILITY :
      This bug can be reproduced always.

      ---------- BEGIN SOURCE ----------
      As above.
      ---------- END SOURCE ----------

      CUSTOMER SUBMITTED WORKAROUND :
      None that I have found.

      The nio.Path API seems to have the same problem.

            bpb Brian Burkhalter
            webbuggrp Webbug Group
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: