Analyzing Page Fault Anomalies
Windows reports some mysterious anomalies via its process-level page fault counter. Can we detect patterns in them, and determine what Windows is doing behind the scenes?
This is the seventh video in Part 3 of the Performance-Aware Programming series. Please see the Table of Contents to quickly navigate through the rest of the course as it is updated weekly. The listings referenced in the video (listings 116, 119 and 120) are available on the github.
In the previous post, I suggested that the Windows anomalies we observed could be explained once we understood four-level paging. Specifically, we wanted to know:
Why did we see extra page faults? We allocated and then touched a specific number of 4k pages, and yet we saw more than that number of page faults. Why?
What were those weird “flat anomalies” we saw in our Windows pre-mapping sawtooth pattern? Can we construct a theory that explains when they happen?
Now that we do understand four-level paging, it’s time to investigate these anomalies properly. We’ll start with number one.
In our previous tests, we didn’t really try to detect when the extra page faults were happening. We noticed that they were there, but we didn’t try to establish a pattern. So the first thing we’d like to do is write a test specifically designed to determine when we are getting these spurious page faults.
In Listing 116, I’ve written a simple main that goes through memory and touches each page while continuously testing to see whether Windows reports more than the expected number of page faults. To make detection easier, I go through the pages backwards because we know that Windows will not do its “pre-fault ahead by 16” trick when we touch memory backwards. This means we should consistently see one page fault per page touched the entire time except when we have these spurious page fault anomalies.
I look for unexpected page faults by counting how many extra page faults there have been, and printing out any time this number goes up. I also keep track of which page index we touched the last time the number went up, so I can report the number of pages it took to get to the next spurious page fault: