0:00
/

Paid episode

The full episode is only available to paid subscribers of Computer, Enhance!

Q&A #85 (2026-05-26)

Answers to questions from the last Q&A thread.

In each Q&A video, I answer questions from the comments on the previous Q&A video, which can be from any part of the course.

The questions addressed in this video are:

  • [0:00:03] “I would like to learn/understand more about performance oriented architecture design as most of the performance resources I see are on pinhole omptimizations, do you have specific examples and general guidelines for that?”

  • [0:05:10] “What are the architectural considerations to allow the compiler to SIMDify your code?”

  • [0:08:13] “You once mentioned that heaps are slow in real life. Could you elaborate on why and which algorithms may be used instead in appropriate situations?”

  • [0:11:09] “I saw some advice outside of this course which was to try multiple overlapped reads at once and bypassing the file cache with FILE_FLAG_NO_BUFFERING. Trying this with 32 overlapped reads and reading the whole file into one big buffer (no chunking), I’m now reading at 6 GB/s. This is in line with my SSD read speed (7GB/s allegedly), but not all the numbers are making sense to me. I have two questions:

    How can bypassing the file cache and reading from SSD possibly be faster than reading from the file cache in memory? My main memory is 64GB so I imagine my whole file fits in the cache.

    Regardless of overlapped reads or bypassing the file cache, if I read the whole file into one big buffer, shouldn’t I be paying the cost of page faults regardless? I’d expect page faults and cache inefficiency would limit me to the original 2.5GB/s.”

  • [0:19:32] “With block interleaving, we are doing extra loads/stores, how much more expensive is that in terms of power consumption? does it make sense to do the in order interleaving on some kind of mobile device specifically for battery life or to help maintain higher boost clocks via lower temperatures?”

  • [0:21:54] “At my job we use Unreal and I find myself writing very defensive code. There is so much “if object exists, do thing, else, log an error and return” across the code base, it’s starting to feel like a waste of time. Especially since in a large number of cases, an early return still leaves the game in an unplayable state. The problem isn’t fixed; only moved elsewhere.

    Do you have any thoughts on when one should do this kind of granular error checking, when one should assert and when it’s okay to just let it crash?

    Is there a better way or do I just have to keep adding all this noise to my code?”

  • [0:25:40] “Is it fine to expose internals of a struct together with “helper functions” to calculate derived data? Should I not worry about exposing internals? Any pointers to where I can learn more about library design in C?”

  • [0:33:31] “In one QA you said that JIT and compile time are not mutually exclusive, but why I never seen some software written in a “JIT-less” language and use a JIT library to fill the compile time gap performance?”

  • [0:36:40] “Something that its still not clear to me is: how do I choose a RAM frequency when buying a PC?”

  • [0:40:09] “You said somewhere that you are not a fan o End To End (E2E) tests, why is that?”

  • [0:45:03] “I’m curious about what to do when a procedure has only one use in the code? How to conciliate this with semantic compression? Assuming that you can only compress something if it was written at least twice. Because sometimes, even though its used once, it makes easier to read.”

  • [0:47:26] “I was wondering: is a relational DB also the 35 year mistake? Since its a statically typed hierarchy that models the domain. My gut feeling is that its not, but it fits very well into the description.”

  • [0:49:32] “In this article the author cites some research papers about how the SOLID principles improve software: https://florian-kraemer.net/software-architecture/2025/02/24/Are-the-SOLID-Principles-problematic.html and related to that, the book Agile software development: Principles, Patterns and Practices by Bob Martin, also tries to give a measurement ‘framework’ for SOLID. Any thoughts on this?”

  • [0:53:53] “In OO code is common to use objects with private data and some methods and constructor to keep the internal data always in a valid state, but in a procedural way, like C, you can misuse a struct. How do you deal with this in C or in procedural way in general?”

  • [0:59:14] “Do you think that is worth creating content like the “OOPs” talk, “Clean Code Horrible Performance”, “Where does bad software come from?” (SOLID) where it tries to persuade the viewer that the “best practice” he was taught is bad? Because even though you are very sharp in your reasoning, your content still get misinterpreted for several reasons. I was thinking of doing something similar myself, grouping everything I learned from the “Handmade way” to present a better way to program, but I’m not sure if its worth it and I don’t have decades of experience so I’m afraid my argumentation would not be good enough.”

  • [1:05:04] “Related to the question above, how do I convince my coworkers that they are doing bad practices?”

  • [1:06:35] “How exactly do I apply WARMED at work?”

  • [1:08:59] “In the end of your IMGUI talk, you said that was excited to see what would be improved in this approach in the next years. So, after 20 years, which improvements you saw?”

  • [1:11:31] “Why compilers don’t have a #pragma DONT_REMOVE, #pragma END_DONT_REMOVE to make our lives easier instead of writing assembly code to do fake reads and writes for measuring performance for a block of code?”

  • [1:13:13] “I’m curious on what a web dev handmade hero would look like, because in the game version is possible to ship a commercial game without using any libraries, however I don’t think that is possible with web, since you need some libraries to deal with security/encryption like HTTPS. So how would you approach this if you were to make a handmade hero for web? Like showing how to create a substack clone, but better :)”

  • [1:14:59] “Knowing the specs of the Lion Cove P-Core of that CPU with 3x load units/ports it makes sense that there is no additional speedup after Read_x3. What puzzles me a bit/makes me curious though is the fact that Read_x3 is consistently the fastest. I tried it numerous times, locked the program to the same single P-core, built dedicated executables to make sure it has nothing to do with some code alignment etc. but Read_x3 is always the fastest with some ~2% edge over Read_x4. Of course 2% is not a huge difference, but I would have assumed that Read_x3 and Read_x4 would converge more or less or if at all that Read_x4 would be slightly faster due to less loop overhead in total.”

  • [1:19:22] “Now that you’re mainly on Linux, have you been using any of the profiling tools that Linux provides in <linux/perf_event.h> and if you do, what are your thoughts on them?” / “In a recent standup I heard how you are now fully switched to Linux and never going back to Windows. I would like to know, what are you now then using for debugging on Linux? As raddbg is not yet ready, though getting closer…”

The full video is for paid subscribers