-
Bug
-
Resolution: Won't Fix
-
P2
-
None
-
1.2.2_014
-
x86
-
windows_2000
The Symantec JIT is causing server application to hang on the certain Intel
servers. The workaround is to disable the JIT and have the JVM run
in interpret mode which causes the customer app to run slower. The customer
ship with JRE 1.2.2_005 and have test with JRE1.2.2_014 and the problem
still exists. symcjit.dll version 3.10.0.107 is contained in both versions.
Customer Problem Description:
While benchmarking the our environment, we observed an unusual amount of
LOCK# signal assertions on the Pentium bus. This signal literally locks
the entire memory for its duration. It is usually caused by atomic
operations that cross cache line boundaries. This is a legacy x86 stuff
that most processor architectures won't support. If it happens
frequently, performance of the machine will be degraded; In essence, it
forces the CPU to go out to the bus, and then turns SMP in to a UP by
locking all the remaining processors and the I/O bridges out of the bus.
Further analysis showed that the JIT compiler symcjit.dll in the SUN java
distribution was responsible. This dll was executing a Pentium XCHG
instruction operating on misaligned data. (addresses other than an integer
multiple of 4 bytes). The operative keyword here is 'misaligned data'.
The XCHG instruction by itself does not cause this. However if the XCHG
operand is on a misaligned boundary, then the processor must assert the
LOCK# signal to ensure atomic access. Note that this is a performance
issue even on a UP because the signal must be asserted regardless...
It is very unusual to have misaligned data and locked access combination.
Most processor architectures won't support bus locking nor misaligned
access. And most compilers won't place data on misaligned boundaries. So
what goes? I want to speculate here that this xchg instruction is hand
coded by inlined assembly and its operand is cast from a smaller data type
therefore causing it to be misaligned, or perhaps a bug.
Fixing this problem should be very easy. In symcjit.dll, ensure that the
operands of the XCHG instruction in the function 500b8410 are 4 byte
aligned.
/ This is the function (atomic exchange) that causes LOCK# signal
assertions due to misaligned data
kd> u 500b8410
500b8410 8b542404 mov edx,[esp+0x4]
500b8414 8b442408 mov eax,[esp+0x8]
500b8418 8702 xchg [edx],eax // the instruction
issuing the locked bus cycles
500b841a c3 ret
500b841b 0500000000 add eax,0x0
500b8420 50 push eax
500b8421 51 push ecx
500b8422 8b08 mov ecx,[eax]
kd>
// This is the code that calls the atomic exchange.
kd> u 5009cbe0 [Local 6:40 PM]
5009cbe0 54 push esp
5009cbe1 2434 and al,0x34
5009cbe3 6808270c50 push 0x500c2708
5009cbe8 52 push edx
5009cbe9 8944243c mov [esp+0x3c],eax
5009cbed 894c2440 mov [esp+0x40],ecx
5009cbf1 e80a8e0100 call 500b5a00
5009cbf6 57 push edi
5009cbf7 55 push ebp
5009cbf8 e813b80100 call 500b8410 // Call to atomic
exchange function.
5009cbfd 8a460c mov al,[esi+0xc]
5009cc00 83c408 add esp,0x8
5009cc03 a802 test al,0x2
5009cc05 750a jnz 5009cc11
5009cc07 8b4c2410 mov ecx,[esp+0x10]
5009cc0b 56 push esi
And here are the address maps that will help debug
(See attached file: d1584.txt)(See attached file: symcjit.txt)
servers. The workaround is to disable the JIT and have the JVM run
in interpret mode which causes the customer app to run slower. The customer
ship with JRE 1.2.2_005 and have test with JRE1.2.2_014 and the problem
still exists. symcjit.dll version 3.10.0.107 is contained in both versions.
Customer Problem Description:
While benchmarking the our environment, we observed an unusual amount of
LOCK# signal assertions on the Pentium bus. This signal literally locks
the entire memory for its duration. It is usually caused by atomic
operations that cross cache line boundaries. This is a legacy x86 stuff
that most processor architectures won't support. If it happens
frequently, performance of the machine will be degraded; In essence, it
forces the CPU to go out to the bus, and then turns SMP in to a UP by
locking all the remaining processors and the I/O bridges out of the bus.
Further analysis showed that the JIT compiler symcjit.dll in the SUN java
distribution was responsible. This dll was executing a Pentium XCHG
instruction operating on misaligned data. (addresses other than an integer
multiple of 4 bytes). The operative keyword here is 'misaligned data'.
The XCHG instruction by itself does not cause this. However if the XCHG
operand is on a misaligned boundary, then the processor must assert the
LOCK# signal to ensure atomic access. Note that this is a performance
issue even on a UP because the signal must be asserted regardless...
It is very unusual to have misaligned data and locked access combination.
Most processor architectures won't support bus locking nor misaligned
access. And most compilers won't place data on misaligned boundaries. So
what goes? I want to speculate here that this xchg instruction is hand
coded by inlined assembly and its operand is cast from a smaller data type
therefore causing it to be misaligned, or perhaps a bug.
Fixing this problem should be very easy. In symcjit.dll, ensure that the
operands of the XCHG instruction in the function 500b8410 are 4 byte
aligned.
/ This is the function (atomic exchange) that causes LOCK# signal
assertions due to misaligned data
kd> u 500b8410
500b8410 8b542404 mov edx,[esp+0x4]
500b8414 8b442408 mov eax,[esp+0x8]
500b8418 8702 xchg [edx],eax // the instruction
issuing the locked bus cycles
500b841a c3 ret
500b841b 0500000000 add eax,0x0
500b8420 50 push eax
500b8421 51 push ecx
500b8422 8b08 mov ecx,[eax]
kd>
// This is the code that calls the atomic exchange.
kd> u 5009cbe0 [Local 6:40 PM]
5009cbe0 54 push esp
5009cbe1 2434 and al,0x34
5009cbe3 6808270c50 push 0x500c2708
5009cbe8 52 push edx
5009cbe9 8944243c mov [esp+0x3c],eax
5009cbed 894c2440 mov [esp+0x40],ecx
5009cbf1 e80a8e0100 call 500b5a00
5009cbf6 57 push edi
5009cbf7 55 push ebp
5009cbf8 e813b80100 call 500b8410 // Call to atomic
exchange function.
5009cbfd 8a460c mov al,[esi+0xc]
5009cc00 83c408 add esp,0x8
5009cc03 a802 test al,0x2
5009cc05 750a jnz 5009cc11
5009cc07 8b4c2410 mov ecx,[esp+0x10]
5009cc0b 56 push esi
And here are the address maps that will help debug
(See attached file: d1584.txt)(See attached file: symcjit.txt)