-
Bug
-
Resolution: Fixed
-
P4
-
11, 13, 14
-
b16
-
Not verified
A DESCRIPTION OF THE PROBLEM :
The documentation for the `maxBytesPerChar` parameter of the constructors of the CharsetEncoder class is currently:
>A positive float value indicating the maximum number of bytes that will be produced for each input character
It is not clear if / how bytes which are added independently of the character count (e.g. BOM) should be considered.
String.getBytes(Charset) requires that the value returned by maxBytesPerChar() includes all character count independent bytes. If this is not the case a BufferOverflowException is thrown. So the maxBytesPerChar documentation should be clearer about this.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Run the code provided below. It implements a custom charset which prepends the encoded string with marker bytes (imagine they have a meaning, e.g. BOM). The encoding itself is performed by casting char to byte, so the encoder returns as maxBytesPerChar() 1.0f.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
No exception is thrown. Encoder encodes 1 char to 1 byte, so maxBytesPerChar value of 1.0f seems reasonable.
ACTUAL -
Exception is thrown. Encoder should have considered length of marker prefix for maxBytesPerChar value.
---------- BEGIN SOURCE ----------
import java.nio.ByteBuffer;
import java.nio.CharBuffer;
import java.nio.charset.Charset;
import java.nio.charset.CharsetDecoder;
import java.nio.charset.CharsetEncoder;
import java.nio.charset.CoderResult;
public class CharsetEncoderTest {
public static void main(String[] args) {
Charset customCharset = new Charset("debug-custom", new String[0]) {
@Override
public CharsetEncoder newEncoder() {
/*
* maxBytesPerChar:
* A positive float value indicating the maximum number of bytes that will be produced
* for each input character
*
* It is not clear that this includes bytes which are written independently of amount
* of actually encoded chars, e.g. BOM or similar
*/
return new CharsetEncoder(this, 1.0f, 1.0f) {
final byte[] PREFIX = new byte[] {'d', 'e', 'b', 'u', 'g'};
ByteBuffer prefixBuf = ByteBuffer.wrap(PREFIX);
@Override
protected CoderResult encodeLoop(CharBuffer in, ByteBuffer out) {
// Write a prefix once when starting to encode
if (prefixBuf.hasRemaining()) {
// ByteBuffer does not provide method to only put as much bytes as possible
// Therefore temporarily set limit (and remaining) to the amount of data
// `out` can accept
prefixBuf.limit(Math.min(prefixBuf.capacity(), prefixBuf.position() + out.remaining()));
out.put(prefixBuf);
prefixBuf.limit(prefixBuf.capacity());
}
int maxEncode = Math.min(in.remaining(), out.remaining());
for (int i = 0; i < maxEncode; i++) {
// Very simple encoding by casting char -> byte
out.put((byte) in.get());
}
if (!in.hasRemaining() && !prefixBuf.hasRemaining()) {
return CoderResult.UNDERFLOW;
}
else {
return CoderResult.OVERFLOW;
}
}
};
}
@Override
public CharsetDecoder newDecoder() {
return new CharsetDecoder(this, 1.0f, 1.0f) {
@Override
protected CoderResult decodeLoop(ByteBuffer in, CharBuffer out) {
// Not relevant for this demo
// ...
in.position(in.limit());
return CoderResult.UNDERFLOW;
}
};
}
@Override
public boolean contains(Charset cs) {
return cs == this;
}
};
customCharset.encode("test"); // Works fine
"test".getBytes(customCharset); // Throws BufferOverflowException
}
}
---------- END SOURCE ----------
The documentation for the `maxBytesPerChar` parameter of the constructors of the CharsetEncoder class is currently:
>A positive float value indicating the maximum number of bytes that will be produced for each input character
It is not clear if / how bytes which are added independently of the character count (e.g. BOM) should be considered.
String.getBytes(Charset) requires that the value returned by maxBytesPerChar() includes all character count independent bytes. If this is not the case a BufferOverflowException is thrown. So the maxBytesPerChar documentation should be clearer about this.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Run the code provided below. It implements a custom charset which prepends the encoded string with marker bytes (imagine they have a meaning, e.g. BOM). The encoding itself is performed by casting char to byte, so the encoder returns as maxBytesPerChar() 1.0f.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
No exception is thrown. Encoder encodes 1 char to 1 byte, so maxBytesPerChar value of 1.0f seems reasonable.
ACTUAL -
Exception is thrown. Encoder should have considered length of marker prefix for maxBytesPerChar value.
---------- BEGIN SOURCE ----------
import java.nio.ByteBuffer;
import java.nio.CharBuffer;
import java.nio.charset.Charset;
import java.nio.charset.CharsetDecoder;
import java.nio.charset.CharsetEncoder;
import java.nio.charset.CoderResult;
public class CharsetEncoderTest {
public static void main(String[] args) {
Charset customCharset = new Charset("debug-custom", new String[0]) {
@Override
public CharsetEncoder newEncoder() {
/*
* maxBytesPerChar:
* A positive float value indicating the maximum number of bytes that will be produced
* for each input character
*
* It is not clear that this includes bytes which are written independently of amount
* of actually encoded chars, e.g. BOM or similar
*/
return new CharsetEncoder(this, 1.0f, 1.0f) {
final byte[] PREFIX = new byte[] {'d', 'e', 'b', 'u', 'g'};
ByteBuffer prefixBuf = ByteBuffer.wrap(PREFIX);
@Override
protected CoderResult encodeLoop(CharBuffer in, ByteBuffer out) {
// Write a prefix once when starting to encode
if (prefixBuf.hasRemaining()) {
// ByteBuffer does not provide method to only put as much bytes as possible
// Therefore temporarily set limit (and remaining) to the amount of data
// `out` can accept
prefixBuf.limit(Math.min(prefixBuf.capacity(), prefixBuf.position() + out.remaining()));
out.put(prefixBuf);
prefixBuf.limit(prefixBuf.capacity());
}
int maxEncode = Math.min(in.remaining(), out.remaining());
for (int i = 0; i < maxEncode; i++) {
// Very simple encoding by casting char -> byte
out.put((byte) in.get());
}
if (!in.hasRemaining() && !prefixBuf.hasRemaining()) {
return CoderResult.UNDERFLOW;
}
else {
return CoderResult.OVERFLOW;
}
}
};
}
@Override
public CharsetDecoder newDecoder() {
return new CharsetDecoder(this, 1.0f, 1.0f) {
@Override
protected CoderResult decodeLoop(ByteBuffer in, CharBuffer out) {
// Not relevant for this demo
// ...
in.position(in.limit());
return CoderResult.UNDERFLOW;
}
};
}
@Override
public boolean contains(Charset cs) {
return cs == this;
}
};
customCharset.encode("test"); // Works fine
"test".getBytes(customCharset); // Throws BufferOverflowException
}
}
---------- END SOURCE ----------
- csr for
-
JDK-8231319 API Doc for CharsetEncoder.maxBytesPerChar() should be clearer about BOMs
-
- Closed
-
- relates to
-
JDK-8231434 Add minBytesPerSequence() to java.nio.charsets.CharsetEncoder
-
- Open
-
-
JDK-8148847 javadoc for CharsetEncoder.maxBytesPerChar() should be made clearer
-
- Closed
-
-
JDK-8262187 CharsetEncoder.maxBytesPerChar() and CharsetDecoder.maxCharsPerByte() return float instead of int
-
- Open
-