The documentation states that readUTF can throw a UTFDataFormatException if the input stream contains invalid UTF-8. However, the input stream for a String does not ever contain UTF-8. Instead, the readUTF reads a serialized String, for which the encoding is irrelevant, but is actually a modified UTF-8.
This is an important distinction between UTF-8 and a "modified" UTF-8 because standard UTF-8 tools will not correctly interpret the "modified" UTF-8 used for String values. Developers may accidentally believe that Strings can be stored as UTF-8, when in fact they cannot.
This is an important distinction between UTF-8 and a "modified" UTF-8 because standard UTF-8 tools will not correctly interpret the "modified" UTF-8 used for String values. Developers may accidentally believe that Strings can be stored as UTF-8, when in fact they cannot.
- relates to
-
JDK-4412509 DataInputStream and DataOutputStream needs better name for method
-
- Closed
-