-
Bug
-
Resolution: Not an Issue
-
P4
-
None
-
5.0
-
x86
-
linux
FULL PRODUCT VERSION :
java version "1.5.0_06"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_06-b05)
Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_06-b05, mixed mode)
ADDITIONAL OS VERSION INFORMATION :
Linux ifs.dk 2.6.14-1.1644_FC4 #1 Sun Nov 27 03:24:54 EST 2005 x86_64 x86_64 x86_64 GNU/Linux
A DESCRIPTION OF THE PROBLEM :
If a utf-8 coded text represent diacriticals, for example ä or ÃÂ¥ as combining diacriticals (so it is apparently called). This means ÃÂ¥ for example is represented as 'a' and codepoint 0x308 (COMBINING RING ABOVE).
But when saving as iso-8859-1 these 2 are saved as 2 separate characters: a and something strange; not as the correct 'ÃÂ¥'.
input file: eksempel.txt (utf-8)
program: t.java
produces out.txt (iso-8859-1)
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
(will attach input / or mail on request)
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
abc äëö abc/
abc è abc/
abc æøÃÂ¥ abc/
ACTUAL -
abc a?e?o? abc/
abc è abc/
abc æøa? abc/
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
import java.io.*;
public class t {
static public void main(String arg[]) {
try {
File f=new File("eksempel.txt");
File fo=new File("out.txt");
// FileReader fr=new FileReader(f);
FileInputStream fis=new FileInputStream(f);
InputStreamReader isr=new InputStreamReader(fis, "UTF-8");
FileOutputStream fos=new FileOutputStream(fo);
OutputStreamWriter osw=new OutputStreamWriter(fos, "ISO-8859-1");
// BufferedReader br=new BufferedReader(fr);
// String line;
//while ( (line=br.readLine())!= null ) {
//System.out.println(line);
//}
int cnt=0;
int totalCnt=0;
char buffer[]=new char[1024];
while ( (cnt=isr.read(buffer, totalCnt, 1024-cnt))!=-1 ) {
totalCnt+=cnt;
}
System.out.println("Read " + totalCnt);
System.out.println(new String(buffer, 0, totalCnt));
osw.write(new String(buffer, 0, totalCnt));
osw.close();
fos.close();
isr.close();
fis.close();
/*
int ch
while ( (ch=fr.read())!=-1 ) {
System.out.println(new Character((char) ch) + " = " +Integer.toOctalString(ch));
}
*/
} catch (Exception e) {
e.printStackTrace();
}
}
}
---------- END SOURCE ----------
java version "1.5.0_06"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_06-b05)
Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_06-b05, mixed mode)
ADDITIONAL OS VERSION INFORMATION :
Linux ifs.dk 2.6.14-1.1644_FC4 #1 Sun Nov 27 03:24:54 EST 2005 x86_64 x86_64 x86_64 GNU/Linux
A DESCRIPTION OF THE PROBLEM :
If a utf-8 coded text represent diacriticals, for example ä or ÃÂ¥ as combining diacriticals (so it is apparently called). This means ÃÂ¥ for example is represented as 'a' and codepoint 0x308 (COMBINING RING ABOVE).
But when saving as iso-8859-1 these 2 are saved as 2 separate characters: a and something strange; not as the correct 'ÃÂ¥'.
input file: eksempel.txt (utf-8)
program: t.java
produces out.txt (iso-8859-1)
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
(will attach input / or mail on request)
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
abc äëö abc/
abc è abc/
abc æøÃÂ¥ abc/
ACTUAL -
abc a?e?o? abc/
abc è abc/
abc æøa? abc/
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
import java.io.*;
public class t {
static public void main(String arg[]) {
try {
File f=new File("eksempel.txt");
File fo=new File("out.txt");
// FileReader fr=new FileReader(f);
FileInputStream fis=new FileInputStream(f);
InputStreamReader isr=new InputStreamReader(fis, "UTF-8");
FileOutputStream fos=new FileOutputStream(fo);
OutputStreamWriter osw=new OutputStreamWriter(fos, "ISO-8859-1");
// BufferedReader br=new BufferedReader(fr);
// String line;
//while ( (line=br.readLine())!= null ) {
//System.out.println(line);
//}
int cnt=0;
int totalCnt=0;
char buffer[]=new char[1024];
while ( (cnt=isr.read(buffer, totalCnt, 1024-cnt))!=-1 ) {
totalCnt+=cnt;
}
System.out.println("Read " + totalCnt);
System.out.println(new String(buffer, 0, totalCnt));
osw.write(new String(buffer, 0, totalCnt));
osw.close();
fos.close();
isr.close();
fis.close();
/*
int ch
while ( (ch=fr.read())!=-1 ) {
System.out.println(new Character((char) ch) + " = " +Integer.toOctalString(ch));
}
*/
} catch (Exception e) {
e.printStackTrace();
}
}
}
---------- END SOURCE ----------