Loading...

XML

Word

Printable

Type: Bug
Resolution: Fixed
Priority: P4
Fix Version/s: 9
Affects Version/s: 6u10
Component/s: core-libs
Labels:
- charset
- webbug

Subcomponent:
java.lang
Resolved In Build:
inapplicable
CPU:

x86
OS:

linux

FULL PRODUCT VERSION :
Java(TM) SE Runtime Environment (build 1.6.0_10-b33)
Java HotSpot(TM) 64-Bit Server VM (build 11.0-b15, mixed mode)

and

Java(TM) SE Runtime Environment (build 1.7.0-ea-b38)
Java HotSpot(TM) 64-Bit Server VM (build 14.0-b05, mixed mode)

ADDITIONAL OS VERSION INFORMATION :
Linux 2.6.27.3 #1 SMP Sat Oct 25 10:15:42 EEST 2008 x86_64 AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ AuthenticAMD GNU/Linux

A DESCRIPTION OF THE PROBLEM :
byte[] String.getBytes(Charset charset) is slower than byte[] String.getBytes(String encoding).
It should not bee because the latter needs to do extra work compared to the first version.

There are two reasons why the Charset version is slower:
1) java.lang.StringCoding.encode(Charset, ...) needlessly copies the char[]
- This causes slowdown of 4-7% for small strings. The slowdown grows for large strings.

2) java.lang.StringCoding.encode(Charset, ...) always creates a new StringEncoder
- Creating of new StringEncoder is slower than using thread local cached one when repeatedly using the same charset

  Suggested replacement code for Java6 and Java5:

static byte[] encode(Charset cs, char[] ca, int off, int len) {
    StringEncoder se = deref(encoder);
    if (se == null || se.cs != cs) {
        se = new StringEncoder(cs, cs.name());
        set(encoder, se);
    }
    return se.encode(ca, off, len);
}

I did not use cs.equals(se.cs) because I think Charset instances are cached and it is not easily possible to create an two unique Charset instances with same name.

Tests:
T1: Repeated encodings of 4 character long strings in ISO-8859-1.
T2: Encodings of 4 character long strings in altering patterns of ISO-8859-1 and UTF-8 so that there are always 2 calls with each encoding before switching.

M1 = getBytes(String)
M2 = getBytes(Charset)
M3 = getBytes(Charset) with attached modifications

Results for Java6:
     M1 M2 M3
T1 1.000 1.792 1.000
T2 1.000 1.073 0.902

Results for Java7:
     M1 M2 M3
T1 1.000 1.792 1.042
T2 1.000 0.974 0.872

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
compare speed of String.getBytes(Charset) to String.getBytes(String)

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
String.getBytes(Charset) should as fast or faster than String.getBytes(String)
ACTUAL -
String.getBytes(Charset) is upto 80% slower than String.getBytes(String)

REPRODUCIBILITY :
This bug can be reproduced always.

CUSTOMER SUBMITTED WORKAROUND :
Use String.getBytes(charset.name()) instead of String.getBytes(charset)

Assignee:: Unassigned

Reporter:: Roger Yeung (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2008-10-27 16:54

Updated:: 2021-08-11 05:40

Resolved:: 2021-08-11 05:10

Imported:: 15/Sep/12 11:31 PM

Indexed:: 17/Jul/12 7:48 PM

Details

Description

Attachments

Activity

People

Dates