Q&A #76 (2025-05-23)

Playback speed

Share post at current time

Share from 0:00

0:00

Paid episode

The full episode is only available to paid subscribers of Computer, Enhance!

Q&A #76 (2025-05-23)

Answers to questions from the last Q&A thread.

May 23, 2025

∙ Paid

In each Q&A video, I answer questions from the comments on the previous Q&A video, which can be from any part of the course.

The questions addressed in this video are:

[00:02] “Not the original author but I have a follow up question on the BTS/bits thread. On x86, writes are atomic only with byte granularity if I'm not mistaken, so in order for BTS to work properly and have multiple threads toggling bits _on the same byte_, there needs to be some sort of atomicity baked in for the RMW series of operations. Seeing the hand rolled series of operations doesn't have any lock/atomic op in there, wouldn't it have a bug when multiple threads try to toggle bits on the exact same byte?”
[08:00] “I have only dabbled in low-level OS and/or bare-metal hypervisor code, but when I did, I was surprised to find how much setup was required to get multiple cores working in x86. Surprisingly little was set up out of the box - it seemed that the cores all started off running the same code and accessing the same data, and I assume that each core's registers started off in the same state (not sure about the details). So, what are the main considerations to get from that initial blank state (post BIOS/entering 64-bit mode) to where your OS/hypervisor can have multiple cores running separate threads of code (and maybe scheduling them)?
A related question, at the level of OS applications: You mentioned once that you prefer to roll your own multithreading code instead of relying on libraries. Having only ever used threading libraries in various languages, it's not clear where to even start, or how my hand-rolled threading code would even benefit me over generic threading libraries. What are the main considerations for rolling my own threading code instead of using something like pthreads? Or am I confusing threading libraries with OS-level threading APIs?”
[18:12] “As a developer with 10 years of experience in Java, I’m considering a career move—not due to layoffs, but in search of a more fulfilling role. (I am also considering other languages not just Java)
I know companies like RAD Game Tools are rare (if not unique), so I’m curious:
Where would you recommend looking for opportunities that prioritize meaningful work?”
[25:26] “I imagine that you don't write your own math implementations for all your projects, since you emphasized that we do this for educational purposes. What do you actually use?”
[29:36] “I recently had an assignment for a 2D graphics course at my university, where i had to implement a edge detection filter in python and c. The goal of the assignment was to compare the performance and later implement a Cython version of the function. I implemented a AVX version and the professor was super stoked. He told me that he used to fiddle with MMX but that it was always a pain to have it compile and that it never ran on other machines because of incompatibility. Was it really that bad back in the day? I feel like even AVX2 now has pretty wide spread support on most computers today, and getting something to compile is just one compiler flag. How do you deal with different architectures / different feature sets to make it more convenient to program? Do you have a wrapper? If so, would you mind sharing it?”
[34:57] “I finally put together a minimal reproducer for the bts question.”
[35:25] “What is the performance difference between the original haversine versus the new haversine with your own sin/cos/etc functions? I feel like that was missing from the last video”
[36:00] “If you could design your own computer science undergrad program, what sorts of classes would you include and focus on and what would you change from the way current universities do things?”

Computer, Enhance!

Paid episode

Q&A #76 (2025-05-23)

The full video is for paid subscribers