-
Bug
-
Resolution: Fixed
-
P3
-
6
-
b75
-
generic
-
generic
Doug Lea writes:
----------------------------------------------
Background: I took the genetic algorithm Traveling Salesperson
Problem example program (src/test/loops/TSPExchangerTest)
and used it as basis for talking about
parallel genetic algorithms etc in my concurrency course.
And in doing so found a better way to structure the genetic
algorithm that removes a fair amount of overhead. And after
doing this noticed that the spin control we applied in
SynchronousQueue really ought to have also been done in
Exchangers. Not doing this is a performance bug -- as
the overhead of other activities decreases, the Exchanger should
spin rather than block. Adjusting Exchanger to use exactly
the same spin control as SQ gives about a 40% performance
improvement on some runs of some tests.
The changes are simple, and identical to SQ. I checked them in.
(I also checked in new TSPExchangerTest.)
Aside: One general rule of thumb for seeing if any non-lock-like
synchronizer would benefit from adaptive spins is to
check if average performance of a group of threads doing
little else but hitting synchronizer significantly
improves when you have more threads than CPUs as opposed to
the same or fewer threads.
----------------------------------------------
Doug Lea writes:
I think I'm done with Exchangers for now. Summary:
1. Use same spin control as SynchronousQueue
The Node wait/signal code is now essentially identical to that
in SQ. And I'm pretty sure that this is the best I know of in
both cases.
2. Integrate the back-off time constants with above, so that
on MPs, the shortest timeout is guaranteed to spin, not block,
which allows decreasing base backoff.
3. Bound the size of backoffs and elimination-slots. Otherwise, you
cannot guarantee to correctly randomly choose them. (This was
an existing problem, although one that won't hit until people
have 65536-way machines :-)
4. Use a ThreadLocal xor-shift random number generator rather than
java.util.Random. Otherwise, the effects of elimination are partially
lost, since threads contend on RNG updates. This didn't show up
before because of the hidden contention on park/unpark locks which
is now greatly reduced.
5. In doExchange, rearrange the order of trying to CAS in "me"
vs CAS'ing out "you". This is a small thing, but the way it
was, it was possible for an Exchanger using only two threads
to hit the elimination array, in contradiction to internal
documentation.
6. Other gratuitous changes to make it read/look better to me.
The net effect is surprisingly big. More than a factor of two
speedup on TSP test on swym (the Univ Rochester 16way sparc)
and on 4way opteron. (This is the most believable performance
test we have.) Similar for other machines and other tests.
Here's TSP on swym. (6Nov05 is last CVS update before making
this set of changes).
Times (seconds)
Threads 6Nov05 12Dec05 tiger
2 172.418 72.997 246.356
4 90.005 37.317 646.314
6 60.503 25.757
8 45.629 22.075
10 38.054 17.697
12 31.711 14.973
14 26.772 13.668
16 23.427 12.500
18 23.625 12.640
20 22.913 13.171
22 23.882 12.850
(I was going to show all times using the tiger version of Exchanger
but gave up waiting for them.)
----------------------------------------------
Background: I took the genetic algorithm Traveling Salesperson
Problem example program (src/test/loops/TSPExchangerTest)
and used it as basis for talking about
parallel genetic algorithms etc in my concurrency course.
And in doing so found a better way to structure the genetic
algorithm that removes a fair amount of overhead. And after
doing this noticed that the spin control we applied in
SynchronousQueue really ought to have also been done in
Exchangers. Not doing this is a performance bug -- as
the overhead of other activities decreases, the Exchanger should
spin rather than block. Adjusting Exchanger to use exactly
the same spin control as SQ gives about a 40% performance
improvement on some runs of some tests.
The changes are simple, and identical to SQ. I checked them in.
(I also checked in new TSPExchangerTest.)
Aside: One general rule of thumb for seeing if any non-lock-like
synchronizer would benefit from adaptive spins is to
check if average performance of a group of threads doing
little else but hitting synchronizer significantly
improves when you have more threads than CPUs as opposed to
the same or fewer threads.
----------------------------------------------
Doug Lea writes:
I think I'm done with Exchangers for now. Summary:
1. Use same spin control as SynchronousQueue
The Node wait/signal code is now essentially identical to that
in SQ. And I'm pretty sure that this is the best I know of in
both cases.
2. Integrate the back-off time constants with above, so that
on MPs, the shortest timeout is guaranteed to spin, not block,
which allows decreasing base backoff.
3. Bound the size of backoffs and elimination-slots. Otherwise, you
cannot guarantee to correctly randomly choose them. (This was
an existing problem, although one that won't hit until people
have 65536-way machines :-)
4. Use a ThreadLocal xor-shift random number generator rather than
java.util.Random. Otherwise, the effects of elimination are partially
lost, since threads contend on RNG updates. This didn't show up
before because of the hidden contention on park/unpark locks which
is now greatly reduced.
5. In doExchange, rearrange the order of trying to CAS in "me"
vs CAS'ing out "you". This is a small thing, but the way it
was, it was possible for an Exchanger using only two threads
to hit the elimination array, in contradiction to internal
documentation.
6. Other gratuitous changes to make it read/look better to me.
The net effect is surprisingly big. More than a factor of two
speedup on TSP test on swym (the Univ Rochester 16way sparc)
and on 4way opteron. (This is the most believable performance
test we have.) Similar for other machines and other tests.
Here's TSP on swym. (6Nov05 is last CVS update before making
this set of changes).
Times (seconds)
Threads 6Nov05 12Dec05 tiger
2 172.418 72.997 246.356
4 90.005 37.317 646.314
6 60.503 25.757
8 45.629 22.075
10 38.054 17.697
12 31.711 14.973
14 26.772 13.668
16 23.427 12.500
18 23.625 12.640
20 22.913 13.171
22 23.882 12.850
(I was going to show all times using the tiger version of Exchanger
but gave up waiting for them.)
- relates to
-
JDK-6384622 Exchanger.exchange(null) may hang
-
- Closed
-