Loading...

XML

Word

Printable

Type: Bug
Resolution: Fixed
Priority: P4
Fix Version/s: 7
Affects Version/s: 1.4.0, 1.4.2
Component/s: core-libs
Labels:
- BOM
- Charset
- UTF
- byte
- i18n
- mr
- order

Subcomponent:
java.nio
Resolved In Build:
b31
CPU:

generic
OS:

generic
Verification:
Not verified

Name: poR10007 Date: 09/23/2002

java.nio.charset.Charset specification reads that initial byte order mark (BOM) should
be omitted when decoding any UTF-encoded byte sequence:

  "In any case, when a byte-order mark is read at the beginning of a decoding operation
   it is omitted from the resulting sequence of characters."

However, according to The Unicode standard, in UTF-16BE, UTF-16LE character-encoding
schemes initial byte order mark should be interpreted as a ZERO WIDTH NO-BREAK SPACE.

The Unicode Standard, Version 3.0, Section 3.8 "Transformations" reads:

D33 UTF-16BE is the Unicode Transformation Format that serializes a Unicode value as
     a sequence of two bytes, in big-endiang format. An initial sequence corresponding
     to U+FEFF is interpreted as a ZERO WIDTH NO-BREAK SPACE.

D34 UTF-16LE is the Unicode Transformation Format that serializes a Unicode value as
     a sequence of two bytes, in little-endiang format. An initial sequence corresponding
     to U+FEFF is interpreted as a ZERO WIDTH NO-BREAK SPACE.

Byte order mark does not make sense for UTF-8 encoding either, so in this case initial
U+FEFF also should be interpreted as a ZERO WIDTH NO-BREAK SPACE.

JDK 1.4.2-beta-b02 meets the Unicode Standard requirements. It omits initial BOM while
decoding UTF-16 byte sequence and interpretes it as a ZERO WIDTH NO-BREAK SPACE while
decoding UTF-8, UTF-16BE, UTF-16LE.

======================================================================

Assignee:: Xueming Shen

Reporter:: Pas Pas (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Created:: 2002-09-23 21:58

Updated:: 2017-05-16 10:19

Resolved:: 2011-05-17 19:33

Imported:: 17/Sep/12 9:16 PM

Indexed:: 28/Jul/12 4:39 AM

Details

Description

Attachments

Activity

People

Dates