Playback speed
Share post
Share post at current time

Paid episode

The full episode is only available to paid subscribers of Computer, Enhance!

Q&A #43 (2024-02-06)

Answers to questions from the last Q&A thread.

In each Q&A video, I answer questions from the comments on the previous Q&A video, which can be from any part of the course.

Questions addressed in this video:

  • [00:03] Going over the challenge homework from last week.

  • [12:38] “Could you please clarify notion of false dependency?

    After watching video I've got an impression that false dependency is when we don't read from a register. Such a dependency doesn't actually create a dependency chain.

    But AMD manual says the reason for slower 'mov al, [rdx]' is a false dependency on previous rax value. Why do they call it a 'false' dependency when its effects are real?”

  • [16:46] “On the homework of the execution ports video, when we're making a single read per loop, shouldn't the processor be able to use both ports by doing reads of different iterations of the loop in parallel given that it does speculative execution?”

  • [23:48] “I've ran read test on Zen3 machine and found out that the performance increases linearly up to 5 memory reads per iteration, and then even more up to 7 loads per iteration, albeit marginally.

    However Zen3 documentation says that the core can do 3 loads per cycle. Am I seeing some other effect that will become clear later in the course?”

  • [25:35] “Regarding execution ports, is a SIMD operation like 'add 4 f32 numbers' handled as submitting 4 micro-ops to 4 'add' ports, or are there separate SIMD ports? Does this vary between CPUs?”

  • [28:52] “In relation to the discussion in the video, do you see any value in a computer architecture where the entire thing is only a 'GPU' plus memory, which runs the OS, and all software as well as rendering graphics?”

The full video is for paid subscribers

Programming Courses
A series of courses on programming topics.