Loading...

XML

Word

Printable

Type: Bug
Resolution: Fixed
Priority: P4
Fix Version/s: 21
Affects Version/s: None
Component/s: tools
Labels:
None
Environment:

jdk-21+12-20-gf3abc4063de

Subcomponent:
javac
Resolved In Build:
b16

In Java classfiles, "Modified UTF-8" encoding is used to 16 bit Unicode characters.

When reading UTF-8 strings from classfiles, the compiler does the minimum amount of work possible to decode each character. In particular, it does not validate that the characters are properly encoded:

* It doesn't verify that 2nd and 3rd bytes have 10 as the top two bits
* It doesn't verify that \u0000 is encoded in two bytes (as is required for "Modified UTF")
* It doesn't verify that the shortest possible encoding was used for each character.

This validation means the compiler will accept classfiles that the JVM would not, which is somewhat bad.

But a worse problem is that because it does not strictly validate the UTF-8 encoding, the compiler allows multiple encodings for the same character sequence. This is bad because the Names table, which is supposed to guarantee uniqueness, does that by hashing the UTF-8 data. So if the compiler reads a classfile that includes the same Name encoded in two different ways, it will add a duplicate Name to the table.

csr for

JDK-8304447 Compiler should disallow non-standard UTF-8 string encodings

Closed

links to

Commit openjdk/jdk/c1f5ca11

Review openjdk/jdk/12893

Assignee:: Archie Cobbs

Reporter:: Archie Cobbs

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2023-03-05 15:25

Updated:: 2023-04-03 12:10

Resolved:: 2023-03-28 09:17

Details

Description

Attachments

Issue Links

Activity

People

Dates