Q&A #26 (2023-09-11)
Answers to questions from the last Q&A thread.
In each Q&A video, I answer questions from the comments on the previous Q&A video, which can be from any part of the course. Transcripts are not available for Q&A videos due to length. I do produce closed captions for them, but Substack still has not enabled closed captions on videos :(
Questions addressed in this video:
[00:24] “I've got an interesting behavior in a debug build where page faults don't seem to make program slower: In a release build it works as expected: file read with malloc on every operation becomes noticeably slower. I wonder what could explain this?”
[05:23] “I have an interesting situation that I haven't seen in the comments, that the number of page faults is low. The estimated kb/fault is about 902kb/fault, this seems very fishy to me, because using other tools I can confirm that the page size on my system is 4k. Is Linux trying to look ahead and guess that I will need more pages in the near future and mapping them?”
[06:58] “When you defined page faults, you also included attempts to access pages mapped as read-only for example. Wouldn't these faults be a distinct thing called protection faults instead in x86 vocabulary, and even have a different entry in the interrupt vector? Or are they really the same, and I'm mixing things up with something else?”
[09:11] “I have tested an assumption: If I allocate only 4kb of memory, then use the same memory to read the file in successive read. I would see a benefit of doing it, because of the page fault of course. But surprise, it's not the case. One read is significantly higher, but the overall read is /2.”
[11:41] “In the page faults video, you attribute the cost of page faults to the work the OS does to manage the contents of the page tables. Was that true only because it was freshly allocated uninitialized (or zeroed) memory? Otherwise, more generally, wouldn't that be negligible compared to the swapping between disk and main memory? I used to have this swapping picture in mind when thinking of page faults, so I never considered the overhead of the interrupt and managing the page tables to be the thing to worry about, before I saw the effect with your example.”
[14:54] “I have read that when two processes load pages from the same DLL, *and use shared memory* (FileMapping) they will share the same physical memory address.
Until, of course, if one process actually writes to the physical memory.
If Process A, sharing the same physical memory space with process B (for the DLL) , write something in the memory space, how the OS handle that ?
Does it have to:
1. Copy the value it was before the process A write
2. Map a new physical memory address in process B and copy the value to new physical memory space
If something like that happen, then they are not sharing the same physical memory space now, not completely at least.”
[18:50] “I added a sleep of one second to each run of the benchmark so I could quickly check whether the first run was actually the one that was slowest, and whether it immediately jumped up to near the min speed after that. It yields some results that surprised me, where the first several runs were slow, and then it settles into a pattern where the first run of each test is the fastest, but then it slows down dramatically, and there are no additional page faults. I checked that I'm not including the second in any of the timings that are actually used. I'm still getting used to the tooling on mac to investigate this kind of thing, but in the meantime do you have any ideas?”
[22:55] “Could we reuse the same virtual address even when we free the memory? If we were able to keep track of pages, even after a free, and reuse it, we could still keep PF very low, isn't it?”
[27:02] “I became confused about memory initialization when you mentioned that Linux will map to a page of 0 values when you allocate. I thought that malloc doesn't initialize to any 0 value, but calloc does (though testing on macos I see that reading after a small malloc everything is set to 0). Does it just depend on the implementation whether malloc will initialize the values?”
[29:15] “How does memory management for page faults happens with the stack? Does the operating system checks that the memory access was related to a stack location and maps a page there?”
[33:40] “Why are the reads able to run faster than the write byte loop? Is it because in the write bytes we are doing one byte at a time whereas the reads might be loading the data in larger chunks or is there something else going on?”
[34:54] “Any pros/cons to using the OS to get page fault (or other) counters vs going to the CPU directly, like reading from an MSR?”
[36:56] “I'm confused by the Linux virtual memory optimization you talked about - where it maps new pages to the zero page, then upon writing, remaps them to physical memory. I don't understand how this is possible - are there separate read/write memory tables in the CPU?”
[39:16] “If virtual memory is mapped to physical memory in pages by the OS "lazily", does this cause problems if you're trying to allocate blocks of memory to optimize for locality - all the data being stored close in physical memory? I.e. if you're trying to avoid memory fragmentation, does the way various operating systems map virtual to physical memory present challenges and is there anything we can do about it? I'm thinking of the case where you would allocate a block of memory for a collection of things, in preference to allocating memory for each item in the collection piecemeal.”