Monday Q&A #21 (2023-07-31)
Answers to all the questions that came in last week...
Each Monday I answer questions from the comments on the prior week’s Q&A video, which can be from any part of the course. Transcripts are not available for Q&A videos due to length. I do produce closed captions for them, but Substack still has not enabled closed captions on videos :(
Questions addressed in this video:
[00:13] From the "Estimating Cycles" part of the course Aaron posted the following question which I would also like to know the answer to:
For the instruction: mov cx, [bp] ; Clocks: +13 = 62 (8 + 5ea) | ip:0x17->0x1a The ea calculation is listed as only taking 5 cycles.
mov CX, [BP] is encoded in the binary as 10001011_01001110_00000000 which decodes to mov CX, [BP + 0] and would mean the effective address takes 9 cycles instead of 5. Is the CPU optimising the +0 away?
Also as a separate question will you be posting the VOD of your Geometric Algebra discussion with Hamish Todd?
[29:24] “My initial attempt for the profiler was already using a tree structure to be able to do a full call-stack attribution. Each time I open a new block it is registered as the child of its parent (if any)
Is there a reason for avoiding this, as compared to the hash thing you mentioned iprof is doing ? Should this tree creation / manipulation in the profiler be avoided for performance reasons ?”
[36:00] “Now that I have my decoder (written in Rust), I've been trying to decode the binary for my 8086 real-time operating system from college, and I'm running into the issue where I can't tell the difference between 8086 instructions and data (DB, DD, etc.). What's the best way to approach this? Or is it basically a lost cause? The arbitrary data makes it so I can't verify that I decoded the file in the same way as I did with the class listings.”
[39:01] “Why not just write a stream of events (ex1: function start time, ex2: function end time, ex3: frame start/end, etc) and collate the hierarchy at the end? Sounds like would have less overhead at runtime and more at processing time?”
[44:12] “Is it worth trying to exclude some estimated overhead time from the captured timings to make final numbers closer to the version without the profiler?
To be able to time a small function that is called a lot of times from different big regions and see if that function is a performance problem in itself regardless of the context.”
[48:13] “So this isn't super related to the course, but it might affect other people and I have no idea what causes it: Sometimes my program stalls at the printfs in the end that print the Harvestine results. I have to press enter for it to actually finish. I had the program running for a few minutes and I thought that it didn't seem right, until I randomly pressed enter in the cmd and it immediately printed the results. The profiler showed the output taking most of the time due to that. It doesn't happen every time either, which makes it extra weird. Any idea what might be causing it, or how I can even try to debug something like this?”
[50:21] “Casey, if the JSON didn't fit RAM, would we have to be concerned about the OS paging? If so, what would we have to do to deal with it?”
[51:33] “My CPU as reported by cpuid is ‘11th Gen Intel(R) Core(TM) i5-11600K @ 3.90GHz’. When I run the code to estimate TSC frequency, the result is 3912002497Hz (3.91GHz). Even though that's pretty close, it's not exact. I was wondering if that's still close enough to the real frequency that it doesn't matter. It's still a lot closer to the real frequency than 10MHz as reported by the OS, right? But if I remember correctly from your video, your estimate was much closer, something like to the 7th significant digit. Any ideas why your estimate seems more accurate than mine?”