Today, when the ZVerifyViews flag is turned on, we unmap all bad views. The intention is to catch stray-pointer bugs.
The current implementation takes a short-cut and unmap all memory en masse. This works for Linux, but not on Windows, where we must be precise in what we unmap:
There are three places where allocated pages are registered today:
- In the page table - actively used
- In the page cache - free pages waiting to be used
- In-flight from the alloc queue
The last item makes it hard to visit all mapped pages. A page that is about to satisfy a request in the alloc queue could come from the page cache and already be mapped. We currently have no way to explicitly unmap such pages.
One solution to this is to always (when ZVerifyViews is used) unmap the view when a page is put into the page cache, and then map it on when its taken out of the page cache.
This way only in-use pages have a valid view into the memory. The checks will be even stricter with this approach. Previously, pages in the page cache had always the good view mapped, and stray pointers into that memory wouldn't immediately be caught.
The proposed solution has one drawback in that the NUMA id initialization today happens when pages are put into the page cache. The patch has to deal with this and changes that initialization to the initial mapping of the page. It also has to deal with the case where larger pages are split into small pages. Only small pages cares about the NUMA id. So, when pages in the page cache is split, we have to be able to query the NUMA id, even though the page has been unmapped.
We might want to explore alternatives to the proposed solution, since the NUMA id fetching when pages are splitting causes a slight abstraction violation between the page allocator and the page cache, where the page allocator needs to be part of page splitting in the cache.
Update: We figured out a way to also visit the "In-flight from the alloc queue" pages. The solution has less collateral damage, and will be the proposed solution.
The current implementation takes a short-cut and unmap all memory en masse. This works for Linux, but not on Windows, where we must be precise in what we unmap:
There are three places where allocated pages are registered today:
- In the page table - actively used
- In the page cache - free pages waiting to be used
- In-flight from the alloc queue
The last item makes it hard to visit all mapped pages. A page that is about to satisfy a request in the alloc queue could come from the page cache and already be mapped. We currently have no way to explicitly unmap such pages.
One solution to this is to always (when ZVerifyViews is used) unmap the view when a page is put into the page cache, and then map it on when its taken out of the page cache.
This way only in-use pages have a valid view into the memory. The checks will be even stricter with this approach. Previously, pages in the page cache had always the good view mapped, and stray pointers into that memory wouldn't immediately be caught.
The proposed solution has one drawback in that the NUMA id initialization today happens when pages are put into the page cache. The patch has to deal with this and changes that initialization to the initial mapping of the page. It also has to deal with the case where larger pages are split into small pages. Only small pages cares about the NUMA id. So, when pages in the page cache is split, we have to be able to query the NUMA id, even though the page has been unmapped.
We might want to explore alternatives to the proposed solution, since the NUMA id fetching when pages are splitting causes a slight abstraction violation between the page allocator and the page cache, where the page allocator needs to be part of page splitting in the cache.
Update: We figured out a way to also visit the "In-flight from the alloc queue" pages. The solution has less collateral damage, and will be the proposed solution.