-
Bug
-
Resolution: Unresolved
-
P5
-
None
-
6
-
x86
-
linux
FULL PRODUCT VERSION :
java version "1.6.0_02"
Java(TM) SE Runtime Environment (build 1.6.0_02-b05)
Java HotSpot(TM) Client VM (build 1.6.0_02-b05, mixed mode, sharing)
java version "1.7.0-ea"
Java(TM) SE Runtime Environment (build 1.7.0-ea-b15)
Java HotSpot(TM) Client VM (build 1.7.0-ea-b15, mixed mode, sharing)
ADDITIONAL OS VERSION INFORMATION :
Linux hostname 2.6.18 #1 Wed Sep 20 03:01:24 CDT 2006 i686 athlon-4 i386 GNU/Linux
A DESCRIPTION OF THE PROBLEM :
JEditorPane does not properly display an HTML character entity representing a value greater than 65535. Instead, it truncates the value to 16 bits. Supplementary characters can only be displayed by placing a surrogate character pair in the HTML.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Create a JEditorPane, and set its content to any HTML document containing a character entity representing a supplementary character, such as "𐐅". It is necessary to install, or place in $JAVA_HOME/jre/lib/fonts/fallback, a font which contains this character; at this time, the only such font I know of is Code2001. If the font is not in the fallback directory, the HTML or JEditorPane must explicitly specify the font family.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
The JEditorPane should display the U+10405 DESERET CAPITAL LETTER LONG OO character, which looks like an oval with a vertical line through it.
ACTUAL -
The character entity is stripped to its lowest 16 bits, which causes U+0405 CYRILLIC CAPITAL LETTER DZE (which resembles an English 'S') to be displayed.
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
import java.awt.*;
import javax.swing.*;
public class SupplementaryTest
{
public static void main(String[] args)
{
EventQueue.invokeLater(new Runnable()
{
public void run()
{
JEditorPane editorPane = new JEditorPane("text/html",
"<html><body><p style="
+ "'font-family: Code2001;"
+ " font-size: 24pt;"
+ "'>"
+ "𐐅"
+ "</body></html>");
editorPane.setEditable(false);
JFrame frame = new JFrame("Supp Test");
frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
frame.getContentPane().add(new JScrollPane(editorPane));
frame.setSize(300, 300);
frame.setLocationByPlatform(true);
frame.setVisible(true);
}
});
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
The workaround is to have HTML entities for the corresponding UTF-16 surrogate pair. In the case of U+10405, placing "��" (that is, U+D801 U+DC05) in the HTML will produce the desired character.
java version "1.6.0_02"
Java(TM) SE Runtime Environment (build 1.6.0_02-b05)
Java HotSpot(TM) Client VM (build 1.6.0_02-b05, mixed mode, sharing)
java version "1.7.0-ea"
Java(TM) SE Runtime Environment (build 1.7.0-ea-b15)
Java HotSpot(TM) Client VM (build 1.7.0-ea-b15, mixed mode, sharing)
ADDITIONAL OS VERSION INFORMATION :
Linux hostname 2.6.18 #1 Wed Sep 20 03:01:24 CDT 2006 i686 athlon-4 i386 GNU/Linux
A DESCRIPTION OF THE PROBLEM :
JEditorPane does not properly display an HTML character entity representing a value greater than 65535. Instead, it truncates the value to 16 bits. Supplementary characters can only be displayed by placing a surrogate character pair in the HTML.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Create a JEditorPane, and set its content to any HTML document containing a character entity representing a supplementary character, such as "𐐅". It is necessary to install, or place in $JAVA_HOME/jre/lib/fonts/fallback, a font which contains this character; at this time, the only such font I know of is Code2001. If the font is not in the fallback directory, the HTML or JEditorPane must explicitly specify the font family.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
The JEditorPane should display the U+10405 DESERET CAPITAL LETTER LONG OO character, which looks like an oval with a vertical line through it.
ACTUAL -
The character entity is stripped to its lowest 16 bits, which causes U+0405 CYRILLIC CAPITAL LETTER DZE (which resembles an English 'S') to be displayed.
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
import java.awt.*;
import javax.swing.*;
public class SupplementaryTest
{
public static void main(String[] args)
{
EventQueue.invokeLater(new Runnable()
{
public void run()
{
JEditorPane editorPane = new JEditorPane("text/html",
"<html><body><p style="
+ "'font-family: Code2001;"
+ " font-size: 24pt;"
+ "'>"
+ "𐐅"
+ "</body></html>");
editorPane.setEditable(false);
JFrame frame = new JFrame("Supp Test");
frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
frame.getContentPane().add(new JScrollPane(editorPane));
frame.setSize(300, 300);
frame.setLocationByPlatform(true);
frame.setVisible(true);
}
});
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
The workaround is to have HTML entities for the corresponding UTF-16 surrogate pair. In the case of U+10405, placing "��" (that is, U+D801 U+DC05) in the HTML will produce the desired character.