-
Bug
-
Resolution: Fixed
-
P3
-
7
-
b35
-
generic
-
generic
-
Not verified
The UTF-8 coder can get dramatic speedup by having a special method
that handles only ASCII, and delegates to a general purpose method
if the input contains non-ASCII.
Here's the kind of method I'm thinking of:
private CoderResult decodeArrayLoop(ByteBuffer src,
CharBuffer dst)
{
byte[] sa = src.array();
int sp = src.arrayOffset() + src.position();
int sl = src.arrayOffset() + src.limit();
char[] da = dst.array();
int dp = dst.arrayOffset() + dst.position();
int dl = dst.arrayOffset() + dst.limit();
CoderResult result = null;
for (;;) {
if (sp >= sl) {
result = CoderResult.UNDERFLOW;
break;
}
int b = sa[sp];
if (b < 0)
break;
if (dp >= dl) {
result = CoderResult.OVERFLOW;
break;
}
da[dp++] = (char) b;
sp++;
}
src.position(sp - src.arrayOffset());
dst.position(dp - dst.arrayOffset());
return result != null ? result : decodeArrayLoop1(src,dst);
}
The non-ASCII decoder case can be sped up as well, by not using the big switch.
More minor improvements:
---
We can get rid of the code below,
since our implementation always guarantees it,
and users cannot create their own buggy ByteBuffer
or CharBuffer implementations, and even if they did,
our code is allowed to assume it is non-buggy.
// assert (sp <= sl);
// sp = (sp <= sl ? sp : sl);
---
In the ASCII case, the &-ing with 0x7f is useless,
since the 0x80 bit is already guaranteed to be off.
// da[dp++] = (char)(b1 & 0x7f);
da[dp++] = (char) b1;
---
More deviously, we can snatch a few cycles in the 2-byte case
as follows:
da[dp++] = (char) (((b << 6) ^ b2) ^ 0x0f80);
// da[dp++] = ((char)(((b1 & 0x1f) << 6) |
// ((b2 & 0x3f) << 0)));
---
Only significant for smaller coding operations, but we should only
instantiate a Surrogate.Generator or Surrogate.Parser in the unlikely
(in the real world) event of surrogates in the input stream.
if (sgg == null)
sgg = new Surrogate.Generator();
int gn = sgg.generate(uc, n, da, dp, dl);
....
---
The comparison below is vacuously true, since c is of type char.
if (c <= '\uFFFF') {
---
that handles only ASCII, and delegates to a general purpose method
if the input contains non-ASCII.
Here's the kind of method I'm thinking of:
private CoderResult decodeArrayLoop(ByteBuffer src,
CharBuffer dst)
{
byte[] sa = src.array();
int sp = src.arrayOffset() + src.position();
int sl = src.arrayOffset() + src.limit();
char[] da = dst.array();
int dp = dst.arrayOffset() + dst.position();
int dl = dst.arrayOffset() + dst.limit();
CoderResult result = null;
for (;;) {
if (sp >= sl) {
result = CoderResult.UNDERFLOW;
break;
}
int b = sa[sp];
if (b < 0)
break;
if (dp >= dl) {
result = CoderResult.OVERFLOW;
break;
}
da[dp++] = (char) b;
sp++;
}
src.position(sp - src.arrayOffset());
dst.position(dp - dst.arrayOffset());
return result != null ? result : decodeArrayLoop1(src,dst);
}
The non-ASCII decoder case can be sped up as well, by not using the big switch.
More minor improvements:
---
We can get rid of the code below,
since our implementation always guarantees it,
and users cannot create their own buggy ByteBuffer
or CharBuffer implementations, and even if they did,
our code is allowed to assume it is non-buggy.
// assert (sp <= sl);
// sp = (sp <= sl ? sp : sl);
---
In the ASCII case, the &-ing with 0x7f is useless,
since the 0x80 bit is already guaranteed to be off.
// da[dp++] = (char)(b1 & 0x7f);
da[dp++] = (char) b1;
---
More deviously, we can snatch a few cycles in the 2-byte case
as follows:
da[dp++] = (char) (((b << 6) ^ b2) ^ 0x0f80);
// da[dp++] = ((char)(((b1 & 0x1f) << 6) |
// ((b2 & 0x3f) << 0)));
---
Only significant for smaller coding operations, but we should only
instantiate a Surrogate.Generator or Surrogate.Parser in the unlikely
(in the real world) event of surrogates in the input stream.
if (sgg == null)
sgg = new Surrogate.Generator();
int gn = sgg.generate(uc, n, da, dp, dl);
....
---
The comparison below is vacuously true, since c is of type char.
if (c <= '\uFFFF') {
---