Playback speed
×
Share post
Share post at current time
0:00
/
0:00

Paid episode

The full episode is only available to paid subscribers of Computer, Enhance!

Q&A #46 (2024-03-11)

Answers to questions from the last Q&A thread.

In each Q&A video, I answer questions from the comments on the previous Q&A video, which can be from any part of the course.

Questions addressed in this video:

  • [00:03] “I've read that L1 uses virtual addresses to run virtual->physical address translation in parallel with cache lookup. If L2 does the same, that would explain L3 speed for zero page on linux.”

  • [02:22] “Could you please explain how L2 can provide dramatically better latency comparing with L3, but the same throughput? If we can issue same number of requests per cycle, and get responses faster (smaller latency), then we will get higher throughput, no?”

  • [14:12] “Do you use intrinsics for your regular SIMD programming? if not what do you use?”

  • [14:44] “Do you use NASM for your own personal work as well or there you use MASM? Are there any advantages to MASM high level macros?”

  • [18:07] “It is important to know the cache line size?”

  • [21:09] “I'm measuring 256B, but the actual size is 64B, could this be because of the capacity of load 4 cache lines in 1 cycle, independently of where they are? Is that because of AVX2 architecture? (my chip is a Ryzen 5 Pro)”

  • [22:32] “The transition occurs between 32KB and 40KB, while my L1 cache is supposedly 64KB. It is a cache that is supposed to be core-specific, but maybe it contains some other stuff ?”

  • [24:06] “Do you think that at some point in the future CPU vendors will discontinue single value registers and only have SIMD registers? (since you can of course still do single value operations with SIMD)”

  • [26:33] “My question is if there is any performance idea in using dynamic allocation like malloc on the start up of you code.. lets say to allocate 4gb of memory for a memory arena and limiting yourself to it.. or just allocating statically and globally an array of 4gb. I remember Casey doing this on Handmade and commenting he would not allocate statically 4gb (but don’t remember if he explained)… Maybe there will be a moment in the course where the question is more relevant, but this is in my mind for a while.. I always see people allocating all memory at start up, but never doing the in compile time allocation.”

The full video is for paid subscribers

Computer, Enhance!
Programming Courses
A series of courses on programming topics.