In each Q&A video, I answer questions from the comments on the previous Q&A video, which can be from any part of the course.
Questions addressed in this video:
[00:03] Going over the challenge homework from last week.
[12:38] “Could you please clarify notion of false dependency?
After watching video I've got an impression that false dependency is when we don't read from a register. Such a dependency doesn't actually create a dependency chain.
But AMD manual says the reason for slower 'mov al, [rdx]' is a false dependency on previous rax value. Why do they call it a 'false' dependency when its effects are real?”
[16:46] “On the homework of the execution ports video, when we're making a single read per loop, shouldn't the processor be able to use both ports by doing reads of different iterations of the loop in parallel given that it does speculative execution?”
[23:48] “I've ran read test on Zen3 machine and found out that the performance increases linearly up to 5 memory reads per iteration, and then even more up to 7 loads per iteration, albeit marginally.
However Zen3 documentation says that the core can do 3 loads per cycle. Am I seeing some other effect that will become clear later in the course?”
[25:35] “Regarding execution ports, is a SIMD operation like 'add 4 f32 numbers' handled as submitting 4 micro-ops to 4 'add' ports, or are there separate SIMD ports? Does this vary between CPUs?”
[28:52] “In relation to the discussion in the video, do you see any value in a computer architecture where the entire thing is only a 'GPU' plus memory, which runs the OS, and all software as well as rendering graphics?”