Q&A #35 (2023-11-20)
Answers to questions from the last Q&A thread.
In each Q&A video, I answer questions from the comments on the previous Q&A video, which can be from any part of the course. Transcripts are not available for Q&A videos due to length. I do produce closed captions for them, but Substack still has not enabled closed captions on videos :(
Questions addressed in this video:
[00:11] “I work in games (we use Unity) and my main interest is GPU programming, I already have a solid grasp on the basics but I really want to get a deeper understanding and improve my knowledge and be better at optimizing both for and on the GPU. I was wondering if you could point me in a good direction for further learning?”
[06:32] “From the latest dependency chains video : I have hard time to understand how the mov, cmp and (next) inc can run in parallel. If the next inc is completed first, then the mov and cmp will be moving and comparing a register that would have been already incremented ?
Unless it happens like the mov and cmp instructions have the ability to work on their own "copy" of the rax register that is not changed even if the inc happens in the meantime ?” / “I wonder why the inc in the same iteration doesn't depend on the mov. Can the CPU change a register before the previous read from it finished? Is it due to RAT mentioned in other comments?” / “Based on the register allocation explanation and OS register swapping. Does that mean that the OS has to flush the CPU side register table on every thread swap? Or does the CPU do some clever tagging to know it can't reuse values across threads?”
[21:11] “What are the best strategies to get consistent, good performance with non-homogeneous CPU core structures such as bigLITTLE and P- and E-Cores? Are there things that we can learn from what PS3 developer experience? Is this something you plan to cover later in the course?”
[24:12] “I have a question about dependency chains, in the newest post you said that the next inc instruction depends on the current inc instruction. But why doesn't inc depend on the result of the branch?”
[34:00] “How do you balance deep interest in something without being discouraged if it seems quite useless (not talking about this course)?”
[37:26] “On the laundry / dryer / us analogy, we still need time allocation in order to take objects inside them and bring it to another one. It seems that CPU will need time in order to push machine code instruction to the data bus, translate it and maybe more.
I'm wondering where the time to do these hardware operations are applied?”
[43:20] “Not particular to this course, but I've been wondering for a while: why are GPU draw calls so expensive?”
[47:13] “Regarding latency and throughput - I thought I understood the single wash + dryer example, but this new example shook my confidence a little. So we're doing the example with a single load, and now the reciprocal throughput is the same as the latency. Is that an accurate measurement of throughput? If so, that means that the measurement of throughput can change depending upon how many ‘load’s are added to the system right?”