-
Type:
Enhancement
-
Resolution: Fixed
-
Priority:
P4
-
Affects Version/s: None
-
Component/s: hotspot
-
b11
A quite large regression that occurred in Pet Clinic that happened if you turned on Compact Object Headers. It was found that a large portion of that regression could be attributed to not finding the monitor in the Object Monitor Cache, and because we couldn't access the Object Monitor Table from C2 generated code, we often had to take the slow path.
This new OMT (Object Monitor Table) implementation is a concurrent hash table specifically designed so that it (in the normal case) can be searched from C2 generated code.
In its simplest form it consists of:
1. An array of pointers.
2. The size (capacity) of the array, which is always a power of two.
When you want to find a monitor associated with an object, you extract the hash value of the object. Then calculate an index by taking the hash value and bit-wise AND it with the capacity mask (e.g. size-1) of the OMT. Now use that index into the OMT's array of pointers. If the pointer is non null, check if it's a monitor pointer that is associated with the object. If so you're done. If the pointer is non null, but associated with another object, you start looking at (index+1), (index+2) and so on, until you either find the correct monitor, or a null pointer. Finding a null pointer means that the monitor is simply not in the OMT.
If the size of the pointer array is significantly larger than the number of pointers in it, the chance of finding the monitor in the hash index (without any further linear searching) is quite high. It is also straight forward to generate C2 code for this, which for the fast path doesn't contain any branching at all. See: C2_MacroAssembler::fast_lock().
When the number of monitors (pointers in the array) reaches above the allowed limit (defined by the GROW_LOAD_FACTOR symbol) we need to grow the table.
A simple description of how growing the OMT is done, is to say that we allocate a new table (twice as large as the old one), and then copy all the old monitor pointers from the old table to the new.
But since the OMT is a concurrent hash table and things needs to work for other clients of the OMT while we grow it, it's gets a bit more complicated.
Both the new and (potentially several) old table(s) may exist at the same time. The newest is always called the "current", and the older ones are singly linked using a "prev" pointer.
As soon as we have allocated and linked in the new "current" OMT, all new monitor pointers will be added to the new table. Effectively making the atomic switch from "old current" to "new current" a linearization point.
After that we start to go through all the indexes in the old table. If the index is empty (the pointer is null) we put a "tombstone" into that index, which will prevent any future concurrent insert ending up in that index.
If the index contains a monitor pointer, we insert that monitor pointer into the OMT which can be considered as one generation newer. If the index contains a "removed" pointer, we just ignore it.
We use special pointer values for "tombstone" and "removed". Any pointer that is not null, not a tombstone and not removed, is considered to be a pointer to a monitor.
When all the monitor pointers from an old OMT has been transferred to the new OMT, the old table is unlinked.
This copying from an old OMT to one generation newer OMT, will continue until all the monitor pointers from old OMTs has been transferred to the newest "current" OMT.
The memory for old, unlinked OMTs will be freed after a thread-local handshake with all Java threads.
Searching the OMT for a monitor pointer while there are several generations of the OMT, will start from the oldest OMT.
A note about the GROW_LOAD_FACTOR: In order to guarantee that the add and search algorithms can't loop forever, we must make sure that there is at least one null pointer in the array to stop them from dead looping. Furthermore, when we grow the OMT, we must make sure that the new "current" can accommodate all the monitors from all older OMTs, while still being so sparsely populated that the C2 generated code will likely find what it's searching for at the hash index, without needing further linear searching. The grow load factor is set to 12.5%, which satisfies the above requirements. Don't change it for fun, it might backfire.
This new OMT (Object Monitor Table) implementation is a concurrent hash table specifically designed so that it (in the normal case) can be searched from C2 generated code.
In its simplest form it consists of:
1. An array of pointers.
2. The size (capacity) of the array, which is always a power of two.
When you want to find a monitor associated with an object, you extract the hash value of the object. Then calculate an index by taking the hash value and bit-wise AND it with the capacity mask (e.g. size-1) of the OMT. Now use that index into the OMT's array of pointers. If the pointer is non null, check if it's a monitor pointer that is associated with the object. If so you're done. If the pointer is non null, but associated with another object, you start looking at (index+1), (index+2) and so on, until you either find the correct monitor, or a null pointer. Finding a null pointer means that the monitor is simply not in the OMT.
If the size of the pointer array is significantly larger than the number of pointers in it, the chance of finding the monitor in the hash index (without any further linear searching) is quite high. It is also straight forward to generate C2 code for this, which for the fast path doesn't contain any branching at all. See: C2_MacroAssembler::fast_lock().
When the number of monitors (pointers in the array) reaches above the allowed limit (defined by the GROW_LOAD_FACTOR symbol) we need to grow the table.
A simple description of how growing the OMT is done, is to say that we allocate a new table (twice as large as the old one), and then copy all the old monitor pointers from the old table to the new.
But since the OMT is a concurrent hash table and things needs to work for other clients of the OMT while we grow it, it's gets a bit more complicated.
Both the new and (potentially several) old table(s) may exist at the same time. The newest is always called the "current", and the older ones are singly linked using a "prev" pointer.
As soon as we have allocated and linked in the new "current" OMT, all new monitor pointers will be added to the new table. Effectively making the atomic switch from "old current" to "new current" a linearization point.
After that we start to go through all the indexes in the old table. If the index is empty (the pointer is null) we put a "tombstone" into that index, which will prevent any future concurrent insert ending up in that index.
If the index contains a monitor pointer, we insert that monitor pointer into the OMT which can be considered as one generation newer. If the index contains a "removed" pointer, we just ignore it.
We use special pointer values for "tombstone" and "removed". Any pointer that is not null, not a tombstone and not removed, is considered to be a pointer to a monitor.
When all the monitor pointers from an old OMT has been transferred to the new OMT, the old table is unlinked.
This copying from an old OMT to one generation newer OMT, will continue until all the monitor pointers from old OMTs has been transferred to the newest "current" OMT.
The memory for old, unlinked OMTs will be freed after a thread-local handshake with all Java threads.
Searching the OMT for a monitor pointer while there are several generations of the OMT, will start from the oldest OMT.
A note about the GROW_LOAD_FACTOR: In order to guarantee that the add and search algorithms can't loop forever, we must make sure that there is at least one null pointer in the array to stop them from dead looping. Furthermore, when we grow the OMT, we must make sure that the new "current" can accommodate all the monitors from all older OMTs, while still being so sparsely populated that the C2 generated code will likely find what it's searching for at the hash index, without needing further linear searching. The grow load factor is set to 12.5%, which satisfies the above requirements. Don't change it for fun, it might backfire.
- relates to
-
JDK-8365493 Regression on Pet Clinic app with Compact Object Headers
-
- Open
-
-
JDK-8377455 Replace LinkedList with GrowableArray in VM.ThreadDump's OMTable
-
- Resolved
-
- links to
-
Commit(master)
openjdk/jdk/119108c0
-
Review(master)
openjdk/jdk/29383