The GC code can generate guard page fault in some tiny timing window. And it will crash the vm when happens. A theoretical case can look like:
1. Thread foo is a newly created thread, so most of its stack pages are guard
pages. The operating system will not allocate(commit) those pages until the
thread "foo" first touch those pages.
2. Now the "foo" thread entering a function which has a big local frame, so
code will allocate the frame by adjusting the stack pointer. Write after
this instruction has been executed, the thread is suspended by GC thread.
3. The GC thread will scan the stack of thread "foo", all the way from its base
to the current stack pointer, and touch the guard page at the stack pointer.
The operating system will generate page fault exception, because GC thread
is touching the guard page of another thread.
Under stressing multi-thread test, it can be reproduced within 2 hours using
ThreadGCBug test program with java_g on win32. It seems optimized version is
much less like to meet this bug.
1. Thread foo is a newly created thread, so most of its stack pages are guard
pages. The operating system will not allocate(commit) those pages until the
thread "foo" first touch those pages.
2. Now the "foo" thread entering a function which has a big local frame, so
code will allocate the frame by adjusting the stack pointer. Write after
this instruction has been executed, the thread is suspended by GC thread.
3. The GC thread will scan the stack of thread "foo", all the way from its base
to the current stack pointer, and touch the guard page at the stack pointer.
The operating system will generate page fault exception, because GC thread
is touching the guard page of another thread.
Under stressing multi-thread test, it can be reproduced within 2 hours using
ThreadGCBug test program with java_g on win32. It seems optimized version is
much less like to meet this bug.