Loading...

XML

Word

Printable

Type: Bug
Resolution: Won't Fix
Priority: P4
Fix Version/s: None
Affects Version/s: 6
Component/s: core-libs
Labels:
- webbug

Subcomponent:
java.nio.charsets
CPU:

x86
OS:

windows_xp

FULL PRODUCT VERSION :
C:\Programme\Java\jdk1.6.0_03\bin>java -version
java version "1.6.0_03"
Java(TM) SE Runtime Environment (build 1.6.0_03-b05)
Java HotSpot(TM) Client VM (build 1.6.0_03-b05, mixed mode)

ADDITIONAL OS VERSION INFORMATION :
Windows XP SR-2

A DESCRIPTION OF THE PROBLEM :
RFC 3629 states that "Implementations of the decoding algorithm MUST protect against decoding invalid sequences."

Current implementation of UTF-8 is not protected against invalid sequences from "ED A0 80" to "ED BF BF". Surrogate pairs are created instead, like CESU-8 does.

Maybe this is as designed. But at least this should be documented in highlighted position, and created surrogate pairs should be valid.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1.) Decode following byte sequence with UTF-8 decoder: "ED, A0, 80, ED, BF,BF"
2.) Decode following byte sequence with UTF-8 decoder: "ED, BF,BF, ED, A0, 80"

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
1.) CoderResult.isMalformed()
2.) CoderResult.isMalformed()

ACTUAL -
1.) valid surrogate pair: U+D800 + U+DFFF
2.) invalid surrogate pair: U+DFFF + U+D800

REPRODUCIBILITY :
This bug can be reproduced always.

Assignee:: Unassigned

Reporter:: Nelson Dcosta (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Created:: 2009-01-28 02:22

Updated:: 2011-02-16 11:15

Resolved:: 2009-03-04 10:23

Imported:: 15/Sep/12 1:25 PM

Indexed:: 17/Jul/12 10:56 AM

Details

Description

Attachments

Activity

People

Dates