In each Q&A video, I answer questions from the comments on the previous Q&A video, which can be from any part of the course. Transcripts are not available for Q&A videos due to length. I do produce closed captions for them, but Substack still has not enabled closed captions on videos :(
Questions addressed in this video:
[00:03] “Every time new hardware comes along, we see people run popular benchmark software - Geekbench, Unigine Heaven, Cinebench. What is special about software that is used for benchmarks? How does it utilize the hardware evenly or fairly? Are they designed to take advantage of the latest features in hardware? Do you put any stock into results? Why or why not?”
[06:29] “Hey Casey. Will you tackle the difference between RISC and CISC and how to program these differently? RISC-V is a promising ISA and would something that I might work in the future specially on servers.”
[18:50] “What is the reason to prefer multi-byte NOPs to repeated single-byte NOPs? The binary size will be the same, right? It's just the asm which is shorter? Is it less work for the CPU to decode it? I just saw from the manual that single byte NOP is actually a XCHG (E)AX, (E)AX, whereas multi-byte NOPs seem to be ‘real’ NOPs?” /
“At first glance NOP instruction seems useless, I see the point in embedded systems, but how useful are they in platforms like Windows? It seems like it's used in order to time a specific instruction like on this particular exercise it may tell us that a mov instruction is sort of equivalent to a NOP 3 bytes instruction.”
[23:53] “Implementing the assembly repetition tests, they report that I am writing at 4.3gb/s even though the CPU frequency timing function reports 3.7GHz, how can I write bytes faster than the reported frequency? I figured out that 3.7GHz is the base frequency and it can go up to 4.6GHz. If I take this maximum frequency, then the result of 4.3gb/s makes sense.
So the question is: How can I figure the maximum CPU frequency using code? Is using CPUID instruction the only way?” /
“When I run these tests on Ryzen 5 3600, all of them have throughput less than 1 cycle. It seems to happen because CPU frequency estimation captures the base clock frequency, and the clock is boosted during the test.
Should we target the maximum frequency in benchmarks like this, or keep the min..max frequency range in mind?”
[28:12] “Another question: When you tested MOVAllBytes and NOPAllBytes we can that the throughput increased because you are not doing a memory move. In my case all of the write bytes variants have the same speed. Is this normal? Is this speed increase CPU specific? (I double-checked that the assembly code is not the same for all cases)”
[29:54] “Why has the 'OOP' mentality of getters and setters become so pervasive? I tried that approach out in college years ago, and very quickly rejected it on the basis of 'this is creating a lot of needless work for me' and 'this is making the code way more verbose for no reason'.”
[33:11] “How do function calls & returns fit into the frontend pipeline? Do they show up to the backend in a similar way to jumps where the frontend just pushes the next instruction after the call/return into the queue? Are there any boundaries other than mispredicted branches where the frontend has to either wait or throw away work it's already done?”
[38:24] “On ABI specification, we found only 4 arguments possible (4 registers), where does go the others when you have more than 4 arguments inside a function? I presume the answer is it pushes it on the stack.”
[38:47] “Can you explain in a more details way of what is going on under the hood when creating a piece of code you want others to use from another program (on Windows and with C)? Like the one you created on part 1. And explain how two codes written in different programming languages can interoperate?”