Q&A #40 (2024-01-18)

Playback speed

Share post at current time

Share from 0:00

0:00

Paid episode

The full episode is only available to paid subscribers of Computer, Enhance!

Q&A #40 (2024-01-18)

Answers to questions from the last Q&A thread.

Jan 19, 2024

∙ Paid

In each Q&A video, I answer questions from the comments on the previous Q&A video, which can be from any part of the course. Transcripts are not available for Q&A videos due to length. I do produce closed captions for them, but Substack still has not enabled closed captions on videos :(

Questions addressed in this video:

[00:02] “Following up the ‘porting a piece of software designed for a particular CPU, console ..etc.’ topic, why the 2023 Console to PC games ports were so bad? If the hardware is so similar, what is gone wrong? Lack of experience? Time?”
[13:57] “From a performance point of view, is there a place for fixed-point arithmetic in modern cpus?”
[17:30] “I like writing software renderers as a hobby. There, I usually find myself struggling with memory access. Are we going to see in this course some techniques that helps in memory constraint situations or when the access to memory is not linear (for example, mapping a rotated texture)?”
[20:03] “How would you approach transitioning from web development to embedded development with like 3 years of total professional experience and no CS/CE degree?”
[23:50] “My project relies on a massive amount of ~random access data that is static at runtime and accessed through a hash map. I figured this might be sped up and memory reduced by just packing the keys and data separately and as tightly in memory as possible, sorting the keys, and binary iterating the array. Turns out, I was right! 500k entries searched in random order gave a ~4x speedup, and 1/2 the memory usage!
But I thought it might be able to be improved. The first traversals of the binary iteration are hopping all over memory, so I figured I'd throw the first 3 jumps in a tight array that fits in a single cache line (and have that index the next 3 jumps that fit in a single cache line...)- essentially a B tree.
The result: ...no improvement whatsoever. This seems like it should be such an obvious win, and I'm surprised it isn't. Do you have any thoughts about what might be going wrong, or where I might focus profiling to remedy the situation?
Another confusing result: I also tried storing the keys and values as a single array of structs rather than each independently (for experimentation), and was surprised that there wasn't any performance loss. Same questions apply here!”
[39:42] “I can't seem to reproduce a similar performance penalty because of code alignment to the results you got when you ran your tests. I am running my tests on a Linux machine with a 10-year-old i7-4790k (Haswell) CPU.”

Computer, Enhance!

Paid episode

Q&A #40 (2024-01-18)

The full video is for paid subscribers