Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8276213

Clarify that CharsetEncoder.maxBytesPerChar() and CharsetDecoder.maxCharsPerByte() should not be used as minimum

    XMLWordPrintable

Details

    • Enhancement
    • Resolution: Unresolved
    • P4
    • None
    • 17
    • core-libs

    Description

      A DESCRIPTION OF THE PROBLEM :
      Sometimes users of CharsetEncoder.maxBytesPerChar() and CharsetDecoder.maxCharsPerByte() assume that the returned values can be used as minimum size of the output buffer, and then later don't correctly handle an OVERFLOW result for `encode` / `decode`. Such code is incorrect because there can be situations where encoders and decoders can only encode multiple chars at once (e.g. surrogate pairs) or decode multiple bytes at once (e.g. supplementary code points).

      It would be good to add something similar to the following to the documentation of the methods (feel free to use less confusing wording):
      - maxBytesPerChar():
      It should not be assumed that an output buffer with exactly this size can be used for all encoding operations, ignoring the total number of characters to encode. In certain cases charsets might only be able to encode multiple characters at once and therefore require an output buffer larger than this size. When calling {@link #encode(CharBuffer, ByteBuffer, boolean) encode} the coder-result must always be checked and the output buffer has to be increased if necessary.


      - maxCharsPerByte():
      It should not be assumed that an output buffer with exactly this size can be used for all decoding operations, ignoring the total number of bytes to decode. In certain cases charsets might only be able to decode multiple bytes at once to multiple characters and therefore require an output buffer larger than this size. When calling {@link #decode(ByteBuffer, CharBuffer, boolean) decode} the coder-result must always be checked and the output buffer has to be increased if necessary.

      ------

      Here is an example showcasing this issue:

      import java.nio.ByteBuffer;
      import java.nio.CharBuffer;
      import java.nio.charset.Charset;
      import java.nio.charset.CharsetDecoder;
      import java.nio.charset.CharsetEncoder;
      import java.nio.charset.StandardCharsets;

      public class EncoderDecoderTest {
          public static void main(String[] args) {
              Charset c = StandardCharsets.UTF_8;
              
              {
                  CharsetEncoder encoder = c.newEncoder();
                  float bytesPerChar = encoder.maxBytesPerChar();
                  ByteBuffer out = ByteBuffer.allocate((int) Math.ceil(bytesPerChar));
                  System.out.println(encoder.encode(CharBuffer.wrap("\uD800\uDC00"), out, false));
              }
              
              {
                  CharsetDecoder decoder = c.newDecoder();
                  float charsPerByte = decoder.maxCharsPerByte();
                  CharBuffer out = CharBuffer.allocate((int) Math.ceil(charsPerByte));
                  System.out.println(decoder.decode(ByteBuffer.wrap(new byte[] {-16, -112, -128, -128}), out, false));
              }
          }
      }



      Attachments

        Activity

          People

            naoto Naoto Sato
            webbuggrp Webbug Group
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: