A DESCRIPTION OF THE PROBLEM :
The interface documentation for the DataInput interface describes in some detail how unicode characters are encoded in UTF-8 by Java, and how the encoded format has a variable length for characters. However the documentation for the readChar() method states "A Unicode char is made up of two bytes." It then goes on to describe how the two bytes are combined into one char. Something is wrong here and I suspect it is readChar's javadoc - but then there may be some subtle difference here between the meaning of 'encoding' and 'reading' . If so the doc is not clear.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Remove the statement "A Unicode char is made up of two bytes." and the following discussion of byte to word combination.
- or -
describe the difference between the UTF-8 encoding and what readChar reads
ACTUAL -
char readChar()
throws IOException
Reads an input char and returns the char value. A Unicode char is made up of two bytes. Let a be the first byte read and b be the second byte. The value returned is:
(char)((a << 8) | (b & 0xff))
This method is suitable for reading bytes written by the writeChar method of interface DataOutput.
URL OF FAULTY DOCUMENTATION :
docs/api/java/io/DataInput.html#readChar()
The interface documentation for the DataInput interface describes in some detail how unicode characters are encoded in UTF-8 by Java, and how the encoded format has a variable length for characters. However the documentation for the readChar() method states "A Unicode char is made up of two bytes." It then goes on to describe how the two bytes are combined into one char. Something is wrong here and I suspect it is readChar's javadoc - but then there may be some subtle difference here between the meaning of 'encoding' and 'reading' . If so the doc is not clear.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Remove the statement "A Unicode char is made up of two bytes." and the following discussion of byte to word combination.
- or -
describe the difference between the UTF-8 encoding and what readChar reads
ACTUAL -
char readChar()
throws IOException
Reads an input char and returns the char value. A Unicode char is made up of two bytes. Let a be the first byte read and b be the second byte. The value returned is:
(char)((a << 8) | (b & 0xff))
This method is suitable for reading bytes written by the writeChar method of interface DataOutput.
URL OF FAULTY DOCUMENTATION :
docs/api/java/io/DataInput.html#readChar()