Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-6392639

Many encoders cannot handle combining character sequences

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Not an Issue
    • Icon: P4 P4
    • None
    • 5.0
    • core-libs

      FULL PRODUCT VERSION :
      java version "1.5.0_06"
      Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_06-b05)
      Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_06-b05, mixed mode)


      ADDITIONAL OS VERSION INFORMATION :
      Linux ifs.dk 2.6.14-1.1644_FC4 #1 Sun Nov 27 03:24:54 EST 2005 x86_64 x86_64 x86_64 GNU/Linux


      A DESCRIPTION OF THE PROBLEM :
      If a utf-8 coded text represent diacriticals, for example ä or ÃÂ¥ as combining diacriticals (so it is apparently called). This means ÃÂ¥ for example is represented as 'a' and codepoint 0x308 (COMBINING RING ABOVE).

      But when saving as iso-8859-1 these 2 are saved as 2 separate characters: a and something strange; not as the correct 'ÃÂ¥'.

      input file: eksempel.txt (utf-8)
      program: t.java

      produces out.txt (iso-8859-1)


      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      (will attach input / or mail on request)

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      abc äëö abc/
      abc è abc/
      abc æøÃÂ¥ abc/

      ACTUAL -
      abc a?e?o? abc/
      abc è abc/
      abc æøa? abc/


      REPRODUCIBILITY :
      This bug can be reproduced always.

      ---------- BEGIN SOURCE ----------
      import java.io.*;

      public class t {
        static public void main(String arg[]) {
                try {
            File f=new File("eksempel.txt");
            File fo=new File("out.txt");
                  // FileReader fr=new FileReader(f);
            FileInputStream fis=new FileInputStream(f);
            InputStreamReader isr=new InputStreamReader(fis, "UTF-8");
            
            FileOutputStream fos=new FileOutputStream(fo);
            OutputStreamWriter osw=new OutputStreamWriter(fos, "ISO-8859-1");
            
              // BufferedReader br=new BufferedReader(fr);
              // String line;
              //while ( (line=br.readLine())!= null ) {
              //System.out.println(line);
              //}
                  int cnt=0;
                  int totalCnt=0;
            char buffer[]=new char[1024];
            while ( (cnt=isr.read(buffer, totalCnt, 1024-cnt))!=-1 ) {
              totalCnt+=cnt;
            }
            System.out.println("Read " + totalCnt);
            System.out.println(new String(buffer, 0, totalCnt));
         
            osw.write(new String(buffer, 0, totalCnt));
            osw.close();
            fos.close();
            
            isr.close();
            fis.close();
            
            /*
            int ch
                  while ( (ch=fr.read())!=-1 ) {
                    System.out.println(new Character((char) ch) + " = " +Integer.toOctalString(ch));
                  }
            */
          } catch (Exception e) {
                  e.printStackTrace();
          }
        }
      }


      ---------- END SOURCE ----------

            sherman Xueming Shen
            ndcosta Nelson Dcosta (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: