CharsetEncoder.canEncode currently receives only a single char as the basis for its decision. One char is often not sufficient:
- if the char is in the high surrogate range, it may depend on the following char whether the combination can be encoded.
- if the char is in the combining Jamo range, it may depend on several following chars whether the combination can be encoded.
- if the character encoder supports sequences with combining marks, and the char is a supported base character, it may depend on several following chars whether the sequence can be encoded.
canEncode should therefore be given access to a CharBuffer. Just like encode, it may have to throw a BufferUnderflowException to obtain sufficient input data. To reduce the number of calls, it may be useful to have canEncode return an integer indicating the number of chars that can be converted before a non-convertible char or the end of the buffer are encountered.
- if the char is in the high surrogate range, it may depend on the following char whether the combination can be encoded.
- if the char is in the combining Jamo range, it may depend on several following chars whether the combination can be encoded.
- if the character encoder supports sequences with combining marks, and the char is a supported base character, it may depend on several following chars whether the sequence can be encoded.
canEncode should therefore be given access to a CharBuffer. Just like encode, it may have to throw a BufferUnderflowException to obtain sufficient input data. To reduce the number of calls, it may be useful to have canEncode return an integer indicating the number of chars that can be converted before a non-convertible char or the end of the buffer are encountered.