Playback speed
Share post
Share post at current time

Paid episode

The full episode is only available to paid subscribers of Computer, Enhance!

Q&A #37 (2023-12-04)

Answers to questions from the last Q&A thread.

In each Q&A video, I answer questions from the comments on the previous Q&A video, which can be from any part of the course. Transcripts are not available for Q&A videos due to length. I do produce closed captions for them, but Substack still has not enabled closed captions on videos :(

Questions addressed in this video:

  • [00:02] “When working with a CPU, we use many of its properties in the code - frequency for benchmarking, number of cores, architecture etc. But what about memory? Where can we use knowledge about memory frequency, its DDR-ness, timings?”

  • [07:14] “How do you stay informed on the most important changes that come with new chip architectures? Do you just skim the CPU manuals from time to time or is there some website/blog etc. you could recommend?”

  • [08:20] “I see that there is some complexity in the frontend around parallelizing decoding, in part due to the fact that x64 is a variable length instruction set, and it has to guess where the beginning of the next instruction will be. Given that some of those attempts are going to end up incorrect, I imagine that the CPU sacrifices some throughput to those mistakes.

    So would a fixed length instruction set be faster for a CPU front end to decode?

    Though, I'm guessing that means that all instructions would have to be as long as the longest variable length instruction and so the wasted L1 instruction cache space would end up becoming the bottleneck instead, and that would likely be a losing trade.. Is that right?”

  • [13:34] “I've been writing tests in an attempt to measure branch-predictor overhead. This style of programming/testing is new to me, so I wanted to get your opinion on the methodology.

    The tests use two assembly functions that take a boolean argument (in the rdi register for my arch), and returns one of two values. The only difference between the functions is whether it compares rdi with 0 or 1.”

  • [14:44] “I was wondering if you might expand on why you think RISC-V is not a good ISA?”

  • [19:26] “I've been wondering what is the historical context for why SIMD fmadd/fmsub instructions exist. Is it because fmadd/fmsub are so common in math that the hardware manufacturers obliged, or is the adder circuit somehow "for free" when we do a multiply, and they just decided to expose the capability?”

  • [22:52] “Can you answer with an open mind the question of: why someone would need other scripting tool for build than bat or shell script that these tools can't provide? (thinking CMake and others)”

  • [27:40] “I've tried everything, but the test performance on ASM procedures seems odd (but comparison between them always stand).

    1. Every time I run the "program", I will have very different result (like sometime 2x more performant in minimum).

    2. Difference between the max and min is very high.

    Is it related to this course and how CPU behave? I'm thinking of something like alignment (like previous person suggest). Like if address alignment would matter.”

  • [28:57] “How does one start figuring out why the assembly code is performing slower than expected?”

  • [29:27] “What is the difference between switch statement and if statement in terms of assembly? I see so many people on the internet saying never use the switch statement. Both of them are just jumps, aren't they?”

  • [34:36] “Since we are discussing frontend. What's up with VLIW? It's an attractable idea to offload frontend work to the compilers, but apparently it didn't work that well. Is super scalar OoO just so much better for general-purpose CPU?

    What's up with other types of chips? You mentioned some GPUs have VLIW. Could you elaborate on that?”

The full video is for paid subscribers

Programming Courses
A series of courses on programming topics.