Paid episode

The full episode is only available to paid subscribers of Computer, Enhance!

Q&A #50 (2024-04-08)

Answers to questions from the last Q&A thread.
17

In each Q&A video, I answer questions from the comments on the previous Q&A video, which can be from any part of the course.

Questions addressed in this video:

  • [00:02] “When talking about cache optimization you mentioned that it is usually faster to do all the work in a single pass, this makes sense but I was wondering then if that is also the case with operations that you only run sometimes. An example of this would be a game where we store all data for an Entity in one big struct and you update and draw at different intervals where we run multiple updates per rendered frame, but iterate over the same data. Would it then be faster to have a branch for drawing that you only take some times?”

  • [03:24] “Do modern systems tend to all have a specific scheduler type and is it important to understand how they work to try and maximise performance?”

  • [06:57] “I think that arrays have some drawback too. You explained that when deciding which ‘row’ in cache to pick, a bit pattern is taken from the address. You also said, that it is taken from *middle* of the address and that low bits are usually ignored. But addresses of the consecutive array element usually differ only in lower bits, right?

    This effectively mean, that most of our array could get mapped to the same ‘row’, which has limited capacity (like 8 cache lines). So if to much of low address bit are excluded from the mask, this would mean, that the whole array would get assigned to single "row". If the size of the array is larger than size of the row, this would mean that only smart portion of the array would be cached, wouldn't it?”

  • [16:53] “In going through your code examples I've noticed you don't really have header files most of the time. It seems you often just have .cpp files and include them in other files. Why is that? When do you need them or what are they useful for?”

  • [21:53] “One question regarding the "do not split loops" advice.

    I frequently have the following situation : I have a big array A of a simple data type (say float or a struct of float) and I have to apply several transformations to each element in A. The intuitive "clean code" approach would be to dedicate an independent function to each such transformation, each function doing the full loop iterations on A.

    If I choose to avoid spurious iteration, I can try to group the transformations within one loop if possible. If doing this, I end up with a huge amount of thing within my single loop, not very readable. An obvious solution would be to create functions that operates on the elements of A, But I'm afraid this would kill performance for 2 reasons (as far as I can see) : a huge number of function calls (should I inline ?), and preventing the opportunity to use SIMD to process several x's in parallell.”

  • [25:47] “When I started doing the homework for the Branch Prediction lesson, my program crashed on this command:

    mov r10, [rdx + rax]

    The crash occured 7 bytes from the end of the buffer; and thinking about it, it made sense: the mov is trying to read 8 bytes at a time, and when rdx + rax reaches the final 8 bytes of the buffer, the mov will try to grab extra memory outside of the buffer.

    I then looked at Casey's program to see what he does to handle this case, and to my surprise, there was nothing; the only difference I could see is that he was using malloc, and I was using VirtualAlloc. Surely this couldn't be it? But it turned out that was indeed the culprit. Replacing with malloc solved the problem.

    So what's going on here?”

The full video is for paid subscribers

Programming Courses
A series of courses on programming topics.