Loading...

XML

Word

Printable

Type: Bug
Resolution: Won't Fix
Priority: P4
Fix Version/s: None
Affects Version/s: None
Component/s: core-libs
Labels:
None

Subcomponent:
java.nio.charsets
CPU:

generic
OS:

generic

Attached reproducer demonstrates the issue. The test converts two sequences of bytes from Windows-31J to UTF_16BE, below is the output

Windows-31J : 81 E8 81 E8
UTF_16BE : 22 2C 22 2C

Windows-31J : 81 E8 81 E9 81 E8
UTF_16BE : 22 2C FF FD 9A 55 FF FD

The first sequence consists of two identical characters (“multiple integral”).
This character is represented in code chart at
https://en.wikipedia.org/wiki/JIS_X_0208#Character_set_0x22_(row_number_2,_special_characters)
Its position is 2-74. This sequence converted to 222C 222C and this result looks expected.

The second sequence consists of three characters, at positions 2-74 2-75 2-74 (“empty cell” at position 2-75 added). One option to treat this case would be to convert this empty cell to replacement character (FFFD) and this sequence would be converted to 222C FFFD 222C. But the current behavior is that only first-byte of empty cell is converted to FFFD and the sequence converted to 222C FFFD 9A55 FFFD

After digging into the source code, my understanding is that the current behavior is implemented as a part of the patch for https://bugs.openjdk.java.net/browse/JDK-8008386

The specific change is in DoubleByte.java (http://hg.openjdk.java.net/jdk8/jdk8/jdk/rev/3b00bf85a6f5#l1.43) and the fallback logic is that it’s treated as first-byte invalid if one of the following conditions is met: 1) first byte is not leading byte, 2) second byte is leading byte, 3) second byte could be decoded as single

For the scenario above (with empty cell), the second byte is valid leading byte and hence only first-byte is replaced with FFFD. It might make sense to slightly relax this check by avoiding the condition 2) so that the empty cell will be treated double-byte invalid.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

Test.java
1 kB
2019-07-11 04:29

Assignee:: Naoto Sato

Reporter:: Dmitry Cherepanov

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2019-07-11 04:29

Updated:: 2024-10-09 12:17

Resolved:: 2019-07-12 06:14

Details

Description

Attachments

Attachments

Activity

People

Dates