Loading...

XML

Word

Printable

Type: Bug
Resolution: Fixed
Priority: P3
Fix Version/s: 7
Affects Version/s: 7
Component/s: core-libs
Labels:
- martin
- performance

Subcomponent:
java.nio.charsets
Resolved In Build:
b35
CPU:

generic
OS:

generic
Verification:
Not verified

The UTF-8 coder can get dramatic speedup by having a special method
that handles only ASCII, and delegates to a general purpose method
if the input contains non-ASCII.

Here's the kind of method I'm thinking of:

private CoderResult decodeArrayLoop(ByteBuffer src,
CharBuffer dst)
{
            byte[] sa = src.array();
int sp = src.arrayOffset() + src.position();
int sl = src.arrayOffset() + src.limit();

            char[] da = dst.array();
int dp = dst.arrayOffset() + dst.position();
int dl = dst.arrayOffset() + dst.limit();

            CoderResult result = null;

            for (;;) {
                if (sp >= sl) {
                    result = CoderResult.UNDERFLOW;
                    break;
                }
                int b = sa[sp];
                if (b < 0)
                    break;
                if (dp >= dl) {
                    result = CoderResult.OVERFLOW;
                    break;
                }
                da[dp++] = (char) b;
                sp++;
            }
            src.position(sp - src.arrayOffset());
            dst.position(dp - dst.arrayOffset());
            return result != null ? result : decodeArrayLoop1(src,dst);
        }

The non-ASCII decoder case can be sped up as well, by not using the big switch.
More minor improvements:

---

We can get rid of the code below,
since our implementation always guarantees it,
and users cannot create their own buggy ByteBuffer
or CharBuffer implementations, and even if they did,
our code is allowed to assume it is non-buggy.

// assert (sp <= sl);
// sp = (sp <= sl ? sp : sl);

---

In the ASCII case, the &-ing with 0x7f is useless,
since the 0x80 bit is already guaranteed to be off.

// da[dp++] = (char)(b1 & 0x7f);
da[dp++] = (char) b1;

---

More deviously, we can snatch a few cycles in the 2-byte case
as follows:

                    da[dp++] = (char) (((b << 6) ^ b2) ^ 0x0f80);
// da[dp++] = ((char)(((b1 & 0x1f) << 6) |
// ((b2 & 0x3f) << 0)));

---

Only significant for smaller coding operations, but we should only
instantiate a Surrogate.Generator or Surrogate.Parser in the unlikely
(in the real world) event of surrogates in the input stream.

                        if (sgg == null)
                            sgg = new Surrogate.Generator();
int gn = sgg.generate(uc, n, da, dp, dl);
                        ....

---
The comparison below is vacuously true, since c is of type char.

if (c <= '\uFFFF') {

---

Assignee:: Xueming Shen

Reporter:: Martin Buchholz

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Created:: 2007-11-30 17:56

Updated:: 2011-05-17 19:19

Resolved:: 2011-05-17 19:19

Imported:: 15/Sep/12 1:24 PM

Indexed:: 17/Jul/12 10:56 AM

Details

Description

Attachments

Activity

People

Dates