-
Bug
-
Resolution: Fixed
-
P3
-
21, 22
-
b03
Issue | Fix Version | Assignee | Priority | Status | Resolution | Resolved In Build |
---|---|---|---|---|---|---|
JDK-8310432 | 21.0.1 | Philip Race | P3 | Resolved | Fixed | b02 |
JDK-8310256 | 21 | Philip Race | P3 | Resolved | Fixed | b28 |
JDK-8335635 | 17.0.13-oracle | Harshitha Onkar | P3 | Resolved | Fixed | b02 |
JDK-8336431 | 17.0.13 | Matthias Baesken | P3 | Resolved | Fixed | b01 |
JDK-8337842 | 11.0.26-oracle | Harshitha Onkar | P3 | Resolved | Fixed | b01 |
JDK-8340305 | 8u441 | Renjith Kannath Pariyangad | P3 | Resolved | Fixed | b01 |
Whilst investigating another issue regarding capture of full screen windows, I found a
way to make the crash more reproducible.
On my VirtualBox if I had just one CPU configured I could trigger a crash 80% of the time.
If I configured 6 (my default) it rarely crashed.
The reported error was typically something like
munmap_chunk(): invalid pointer
or
double free or corruption (out)
or
tcache_thread_shutdown(): unaligned tcache chunk detected
and the error came right during clean up and it didn't crash
if I commented out calls to
fp_pw_thread_loop_stop(pw.loop);
fp_pw_thread_loop_destroy(pw.loop);
I turned on pipewire debugging at various levels from 1-5, eg
export PIPEWIRE_DEBUG=2
but it wasn't pointing to a smoking gun but it seemed like we perhaps
had a threading related issue since the number of cores affected reproducibility.
After a bit of studying the code with this in mind I think I found the problem.
We are using the thread loop functions (ie those mentioned above) and in such
a case it is required to use pw_thread_loop_lock() / pw_thread_loop_unlock()
around calls
https://pipewire.github.io/pipewire/page_thread_loop.html
"The lock needs to be held whenever you call any PipeWire function that uses an object associated with this loop."
That's not super-specific but I noticed that a previous part of the clean up probably needed these calls but was missing them
Adding then as shown here seems to completely cure the crashes
--- a/src/java.desktop/unix/native/libawt_xawt/awt/screencast_pipewire.c
+++ b/src/java.desktop/unix/native/libawt_xawt/awt/screencast_pipewire.c
@@ -887,8 +887,10 @@ JNIEXPORT jint JNICALL Java_sun_awt_screencast_ScreencastHelper_getRGBPixelsImpl
screenProps->captureData = NULL;
screenProps->shouldCapture = FALSE;
+ fp_pw_thread_loop_lock(pw.loop);
fp_pw_stream_set_active(screenProps->data->stream, FALSE);
fp_pw_stream_disconnect(screenProps->data->stream);
+ fp_pw_thread_loop_unlock(pw.loop);
}
}
I also noticed that the function doCleanUp() has
if (screenProps->data->stream) {
fp_pw_stream_disconnect(screenProps->data->stream);
fp_pw_stream_destroy(screenProps->data->stream);
screenProps->data->stream = NULL;
}
This
(1) Again doesn't have any locking but it doesn't seem to be causing a problem here but I think should be added anyway
(2) Seems to repeat the disconnect call. I'm inclined to leave this alone for now since it doesn't seem to cause harm
but I don't know that its guaranteed to be idempotent. This needs more analysis probably under a separate bug id.
- backported by
-
JDK-8310256 Occasional crashes with pipewire screen capture on Wayland
- Resolved
-
JDK-8310432 Occasional crashes with pipewire screen capture on Wayland
- Resolved
-
JDK-8335635 Occasional crashes with pipewire screen capture on Wayland
- Resolved
-
JDK-8336431 Occasional crashes with pipewire screen capture on Wayland
- Resolved
-
JDK-8337842 Occasional crashes with pipewire screen capture on Wayland
- Resolved
-
JDK-8340305 Occasional crashes with pipewire screen capture on Wayland
- Resolved
- links to
-
Commit openjdk/jdk21/3698a022
-
Commit openjdk/jdk/d3d0dbc3
-
Commit(master) openjdk/jdk17u-dev/a1e7701e
-
Review openjdk/jdk21/25
-
Review openjdk/jdk/14428
-
Review(master) openjdk/jdk17u-dev/2712