Q&A #44 (2024-02-26)

Playback speed

Share post at current time

Share from 0:00

0:00

Paid episode

The full episode is only available to paid subscribers of Computer, Enhance!

Q&A #44 (2024-02-26)

Answers to questions from the last Q&A thread.

Feb 27, 2024

∙ Paid

In each Q&A video, I answer questions from the comments on the previous Q&A video, which can be from any part of the course.

Questions addressed in this video:

[00:04] “So regarding this dependency that gets created by using AL: does this apply for all MOVs that aren't full 64 bit? So EAX and AX also take this hit?”
[02:27] “Given the idea that 'doing more things at the same time is faster', how come we have not seen a mainstream 'CPU' that works more like an FPGA: configure the logic gates to do a given task, in a way that can process hundreds or thousands of things in parallel, run the task through the network, reconfigure, repeat?”
[05:22] “Why should I care about NP completeness? Does it have any practical use, or is it basically computer science mumbo jumbo? Should I care about analyzing my algorithms to see if they're NP complete or not?”
[18:17] “Why in your code you tend to use integer types with explicit sizes, uint64_t for example, instead of using more generic types like int, long, size_t, ... that give you guarantees of being the the biggest that is currently available on the given machine. Do you use those types in some situations?” / “Why do you use b32 instead of bool?”
[30:20] “I'm failing to grasp how we count bytes for throuput calculations in things like NOPAllBytesASM or DECAllBytesASM which are not moving bytes to memory as opposed to things like WriteToAllBytes or MOVAllBytesASM.”
[33:38] “I'm using an Ivy Bridge CPU with an estimated rdtsc frequency of 3.4 GHz. Running the conditional NOP tests I find that NeverTaken and AlwaysTaken have the same throughput of 1.7 (as do Every 2, 3 and 4; Random drops to 0.3). It seems that any loop with a branch in it has a ceiling of 1.7. The optimization manual shows that Ivy Bridge has only one branch port whereas Skylake has two. I think that explains the difference in our results. Does that sound plausible?”
[35:05] “I should be able to do every program I want only using sys calls, right? However, for graphics programming I straight up need to work with macOS Frameworks (Cocoa/Metal), why is this necessary?”
[38:30] “You talked about the ABI for calling functions even if they belong to a library. Constantly I see that the way that the library is exposed is via header files, however is easy that my programming language doesn’t work with those, so I need to explicitly say the symbols I expect and how to call them. For that I face something strange, a c++/obj-c/Swift library exposes symbols that are really different from what I manually see in the header. For example to call the function CreateContext in this header file https://github.com/ocornut/imgui/blob/master/imgui.h#L298, when I build a static library and look at the symbols for that function I find that the name is actually called “ZN5ImGui13CreateContextEP11ImFontAtlas”. In the end I see that somebody has to write a “wrapper” for C first, and then from that C wrapper to each language. Is this truly necessary? How would you work with this?”
[43:56] “I see that to work with a GPU I need to write some code in a GPU language like MSL (Metal). How does that work?”
[52:27] “Related to the "Linking Directly to ASM for Experimentation" video. I tested it in macos (on an M1 pro) and I don't see any significant difference between the implementations. I checked the assembly of all functions and they all behave as expected. How can I investigate why there's no difference? Or any suggestions on what can be happening?”

Computer, Enhance!

Paid episode

Q&A #44 (2024-02-26)

The full video is for paid subscribers