We've seen that OS-reported "page faults" can dramatically affect our file read performance. What exactly are they, and why do they slow down our program?
This is the fourth video in Part 3 of the Performance-Aware Programming series. Please see the Table of Contents to quickly navigate through the rest of the course as it is updated weekly. The listings referenced in the video (listings 108, 109, 110 and 111) are available on the github.
In the previous video, we saw “page faults” were causing our application to get slower read performance than we would otherwise. But what exactly is a “page fault”, and why does it slow us down?
You've surely heard of “virtual memory”. Everyone has heard of it at this point. Even non-technical computer users are vaguely familiar with the concept.
Virtual memory, as we know, is the idea that we’d like to be able to work with more total memory than exists as physical RAM in our computer. If we have plenty of disk space available to use for storage, ideally we would like to extend the amount of total memory we can use by swapping pieces of memory to and from disk as necessary.
Sure, this will be slower sometimes — it may cause hiccups when we swap, or might even be pathologically slow in some cases. But most of the time, it “just works” and let’s us deal with situations where we have, for example, opened too many large-footprint applications at the same time.
Another closely related concept — but one less frequently mentioned — is the idea of memory protection. This is the idea that applications should not be allowed to write to memory reserved for other applications or the operating system. This is a critical feature of modern computing, and we rely on it not only for security, but also to increase reliability — without it, any individual buggy app could easily overwrite some important data in the operating system kernel, and crash the whole computer.
Virtual memory and memory protection aren’t free. They both require a substantial amount of work from both the CPU and the OS. It’s specifically because of this extra work that we have these “page faults” the OS counters have alerted us to.
We wouldn’t have such a thing if we were still on an 8086, which supported neither virtual memory nor memory protection. As we saw in the first part of this course, on that CPU, memory was addressed as a linear series of bytes. There was an effective address calculation, that combined some registers into a 16-bit offset. Then there was a segment offset operation that combined it with a segment register to produce the final 20-bit memory address. This memory address was then used by the memory bus to access physical RAM.
Now, that didn’t always mean it actually went to RAM. Even the 8086 could have certain address regions mapped to things other than the main RAM. Regions might actually map to ROMs in the machine, or dedicated graphics memory on a graphics adapter. But in any region of memory that wasn’t handled specially, the addresses corresponded to physical RAM directly. This meant that if a program accessed a byte at address 329,104, it was going to ask the memory controller for byte 329,104 of main memory.
If we fast forward to today, we've gone through a lot of hardware and software changes to support features the 8086 lacked. Modern CPUs no longer treat computed addresses as direct RAM locations. x64 chips do still have effective address calculations (as we saw), and they do still have segment offsets in some circumstances. So those parts are fairly similar.
But unlike the 8086, modern x64 CPUs have to support virtual memory and memory protection. To accomplish this, they have an additional layer of address translation that happens after computing the linear memory offset specified by the program. In fact, the addresses we work with in our programs are actually just virtual addresses which serve as the inputs to this translation process.
So, for every process running on a modern operating system, there is actually a dedicated virtual address space that is specific to that process: