This is the fifth video in Part 3 of the Performance-Aware Programming series. Please see the Table of Contents to quickly navigate through the rest of the course as it is updated weekly. The listings referenced in the video (listings 112, 113, 114 and 115) are available on the github.
In the previous video we learned what page faults were, and how to detect that they were happening. We also saw that they could lead to performance penalties we weren't expecting to have.
I was planning on a fairly brief, shallow coverage of page faults, because they're not that critical to understand for performance-aware programming. If you want to really be an expert optimizer, it probably makes sense to learn a lot about how the OS manages memory pages. But since we’re just trying to cover the basics here, I didn’t consider it essential to know all the details of OS paging behavior.
However, based on the feedback from the previous post, it seems like this is a rather hot topic for a lot of people taking the course. So, in order to make sure there is enough information on how memory paging works, I’m going to take a little more time to demonstrate a few more advanced memory paging concepts, since people seem to have an appetite for it!
Let’s start with the question, what does the OS page fault counter actually count? Remember, we are writing our code in user mode. We are not the kernel. So we don’t actually know when interrupts are happening. Unless we install special drivers, or run a kernel debugger, we are using operating-system supplied counters to try to determine what is really going on.
So yes, in the last post, we did measure the number of “page faults” that were happening. We could assume that these refer to actual page fault interrupts that are occurring in our process. But are they?
This might seem like an odd question to ask. Why would an OS counter called “page faults” count something other than page faults?
Well, it might count real page faults. It might be a integer per process that is incremented whenever the page fault interrupt handler is triggered while a thread from that process is running. That’s certainly a possible thing you could imagine Windows doing.
But it might also just be a count of how many pages have had their mapping updated. In other words, Windows might not be counting the interrupts themselves, but the number of pages mapped during the interrupt. If it happens to only map one page per interrupt, then these will be the same — but if it maps more than one page per interrupt, then they could differ substantially.
While subtle, there is an important difference here. For example, if Windows mapped 64 pages at a time, then an interrupt counter would only register 2 interrupts if the user touched 128 pages. But a counter that instead tracked page mappings would register 128, even though only 2 interrupts had occurred. And we do care about this difference, because interrupts are expensive, so taking 2 interrupts to map 128 pages will presumably be much faster than taking 128 interrupts to map 128 pages!
So, which one of these alternatives is it? What does Windows actually count? How could we figure this out in general, for any operating system that provides us a “page fault” counter? And can this tell us anything about how the operating system decides to map pages?
In the previous post, we made the assumption that the page fault count was basically a count of the actual interrupts. This is a reasonable assumption, because the number of page faults we observed was close to the total number of pages we used, and our performance dropped substantially when the page fault count was high. But, as I mentioned in that post, we didn’t really know. It could be that Windows was mapping multiple pages at a time, because the fault counting we were doing wouldn’t have been able to distinguish that.
Well, now we’d like to actually know. So what kind of test could we construct that would tell us more definitively?