Q&A #75 (2025-05-02)

Playback speed

Share post at current time

Share from 0:00

0:00

Paid episode

The full episode is only available to paid subscribers of Computer, Enhance!

Q&A #75 (2025-05-02)

Answers to questions from the last Q&A thread.

May 02, 2025

∙ Paid

In each Q&A video, I answer questions from the comments on the previous Q&A video, which can be from any part of the course.

The questions addressed in this video are:

[00:03] “You mentioned in the discussion about indices that they are useful for serialization, since we can flat store and load the in-memory data, but let's say we stored the data on a big-endian machine but want to load it on a little-endian machine (e.g. transferring game saves). We can no longer simply read all the data in, instead we need to byteswap every relevant piece. How would you approach this?”
[11:05] “I recently wrote basic loaders for wav, aiff and midi files, and they all seem to use 'chunks', i.e. they say the 'type' and 'size' of the next 'chunk' of data to come. Since these formats are pretty old, I wonder is this still the best way to organize file formats? Have you designed custom formats, and if so what would you say are good practices?”
[26:19] “I worked only with C in this course, but I've became frustrated a bit with the quality of error messages cl compiler gives me. I know you use cpp files and cl compiler gives much better erros when compiled for C++ (or with /TP flag). Do you have any recommendations? Would you recommend using cpp (sticking to C style) and do not use any C-specific features or I'm missing something?”
[32:22] “Regarding the bts question asked in QA#72, my instinct is to worry about false sharing, as the code is not using atomic writes. As far as I understand, x64 at least has strong memory ordering guarantees, so we don't have to worry about load-acquire and store-release fences here, whereas on ARM / RISC-V we would have yet another data race cause. I would split the bits out to an owned cache line per thread in order to avoid a serial dependency chain for the bitmap. Is there anything I'm missing in my analysis? How do you tend to structure multithreaded gathers like this? Oh and is there any clever way you know to write C code in order to mitigate false sharing?”
[48:00] “N00b question: Why is it worth studying a CPU pipeline diagram? What information one can get there which is not available in other sources?”
[53:38] “Have you tried including and using quadmath.h and f128 for the haversine computation to get a feel for how many digits are lost in the computation? I guess converting to f32 and back each computational step would show the same thing and the problem is greatest a small distance from the maximum.”
[56:10] “Maybe I missed it, but why we cannot use _mm_sin_ps and _mm_asin_ps?”

Computer, Enhance!

Paid episode

Q&A #75 (2025-05-02)

The full video is for paid subscribers