<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Computer, Enhance!: Programming Courses]]></title><description><![CDATA[A series of courses on programming topics.]]></description><link>https://www.computerenhance.com/s/programming-courses</link><image><url>https://substackcdn.com/image/fetch/$s_!7DRL!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4c646b6-92ad-4e9d-95da-e629e19689f4_800x800.png</url><title>Computer, Enhance!: Programming Courses</title><link>https://www.computerenhance.com/s/programming-courses</link></image><generator>Substack</generator><lastBuildDate>Sun, 14 Jun 2026 00:15:44 GMT</lastBuildDate><atom:link href="https://www.computerenhance.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Casey Muratori]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[computerenhance@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[computerenhance@substack.com]]></itunes:email><itunes:name><![CDATA[Casey Muratori]]></itunes:name></itunes:owner><itunes:author><![CDATA[Casey Muratori]]></itunes:author><googleplay:owner><![CDATA[computerenhance@substack.com]]></googleplay:owner><googleplay:email><![CDATA[computerenhance@substack.com]]></googleplay:email><googleplay:author><![CDATA[Casey Muratori]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Q&A #86 (2026-06-10)]]></title><description><![CDATA[Answers to questions from the last Q&A thread.]]></description><link>https://www.computerenhance.com/p/q-and-a-86-2026-06-10</link><guid isPermaLink="false">https://www.computerenhance.com/p/q-and-a-86-2026-06-10</guid><pubDate>Wed, 10 Jun 2026 16:00:39 GMT</pubDate><enclosure url="https://substack-video.s3.amazonaws.com/video_upload/post/201377389/e275e24d-f0d0-4a43-981f-de8145d8210f/transcoded-261239.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>In each Q&amp;A video, I answer questions from the comments on the previous Q&amp;A video, which can be from any part of the course.</em></p><p><em>For those wondering what the name of the audio API was that made me go &#8220;I can&#8217;t believe I am forgetting the acronym&#8221; in the middle of this video, it was <a href="https://en.wikipedia.org/wiki/Audio_Stream_Input/Output">ASIO</a>.</em></p><p>The questions addressed in this video are:</p><ul><li><p><strong>[00:02]</strong> &#8220;The D language actually has this because of Uniform Function Call Syntax. You could have a struct like &#8230; and have the rest of the code be none the wiser (even code like foo.x = 7). I think C# has stuff like this too. Is this what you&#8217;d be hoping for? If so, I&#8217;m surprised - I&#8217;d have guessed you usually don&#8217;t prefer implicit magic like this happening on things that don&#8217;t look like function calls.&#8221;</p></li><li><p><strong>[04:37]</strong> &#8220;Could someone remind me what the &#8216;warm&#8217; criteria was?&#8221;</p></li><li><p><strong>[06:00]</strong> &#8220;I have a question about managing intrusive thoughts while writing code. When thoughts come up about edge cases, performance, reliability, or decisions that may need to be revisited later, how do you decide whether to address them immediately or set them aside? I&#8217;m finding it difficult to iterate quickly because I often get pulled into these concerns before I&#8217;ve built a working v0. I&#8217;ve tried writing TODOs in the code and keeping separate notes, but neither approach has worked well for me.&#8221;</p></li><li><p><strong>[15:47]</strong> &#8220;I now have a question about wasapi (I believe you now use wasapi but if I&#8217;m wrong then you can ignore this question). I&#8217;m writing the sound system for my codebase and had my wav parser ready, mixer chugging along, sat back to enjoy a song but then found out that changing the system volume causes these pops.... I know that changing the volume of a sound wave causes the discontinuity which manifests itself as audio popping but in this case where it&#8217;s not an internal volume I have control over I don&#8217;t know how to fix these. It seems to me as if windows just immediately applied the new volume and my futile attempt of trying to bypass windows&#8217; volume logic didn&#8217;t work since they seem to first clamp my samples and then multiply by the volume... I tried some wasapi gist examples and they all had this same issue. Have you had this problem and fixed it in your own code and could you share your knowledge if you have?&#8221;</p></li><li><p><strong>[25:07]</strong> &#8220;I formed my question about heaps slightly misleading. It wasn&#8217;t about a memory heap, but a data structure, like in the heap sort. In the video together with Primeagen regarding code interviews you commented that heaps are quite slow. So I was wondering, what should you use instead, when the need arises?&#8221;</p></li><li><p><strong>[33:27]</strong> &#8220;I&#8217;m working on a project that uses a cortex-m7, a fairly fast 32-bit embedded core. It has some fancy pipeline features of big boy desktop chips, like a branch predictor, and the ability to dual issue many instructions, but doesn&#8217;t have a RAT (it&#8217;s not &#8220;out-of-order&#8221; at least as far as I know). Do I understand correctly that in CPUs without a RAT, any two adjacent instructions that use the same register are effectively serially dependent? Is this how it used to work in the desktop world back in the day?&#8221;</p></li><li><p><strong>[40:53]</strong> &#8220;This question relates to Part 5: Dependency Chain Stalls and In-order Interleaving. From these results, I would expect 12-way interleaving to maintain peak throughput regardless of chain length (4 FMAs/cycle * 3 cycles = 12 FMAs in flight needed to cover the latency). However, that&#8217;s not what I observe. With 12-way interleaving, throughput decreases from ~3.6 to ~3.0 FMAs/cycle as the chain length grows. I need a minimum of 16-way interleaving for throughput to stay flat at ~3.8 FMAs/cycle. My conclusion is that the CPU can&#8217;t sustain the 3-cycle latency when the FMA execution ports are saturated, resulting in an effective latency of 4 cycles under load. This would explain why I need 4 * 4 = 16 chains to obtain a flat throughput curve. I think you&#8217;ve hinted at this in some of your videos. Would you kindly explain why this happens? Thanks!&#8221;</p></li></ul>
      <p>
          <a href="https://www.computerenhance.com/p/q-and-a-86-2026-06-10">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[How to Use uops.info]]></title><description><![CDATA[Now that we've done our own microarchitecture investigations, it's time to get familiar with one of the best x64 microarchitecture data sites.]]></description><link>https://www.computerenhance.com/p/how-to-use-uopsinfo</link><guid isPermaLink="false">https://www.computerenhance.com/p/how-to-use-uopsinfo</guid><pubDate>Sat, 30 May 2026 03:58:08 GMT</pubDate><enclosure url="https://substack-video.s3.amazonaws.com/video_upload/post/199798918/cdd5dbc2-c1b8-407a-af0f-33318b8bb044/transcoded-224503.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fdNi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F910ab22a-20db-4541-b35d-734a0dbf3329_1920x622.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fdNi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F910ab22a-20db-4541-b35d-734a0dbf3329_1920x622.jpeg 424w, https://substackcdn.com/image/fetch/$s_!fdNi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F910ab22a-20db-4541-b35d-734a0dbf3329_1920x622.jpeg 848w, https://substackcdn.com/image/fetch/$s_!fdNi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F910ab22a-20db-4541-b35d-734a0dbf3329_1920x622.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!fdNi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F910ab22a-20db-4541-b35d-734a0dbf3329_1920x622.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fdNi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F910ab22a-20db-4541-b35d-734a0dbf3329_1920x622.jpeg" width="1456" height="472" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/910ab22a-20db-4541-b35d-734a0dbf3329_1920x622.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:472,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:357350,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.computerenhance.com/i/199798918?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F910ab22a-20db-4541-b35d-734a0dbf3329_1920x622.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fdNi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F910ab22a-20db-4541-b35d-734a0dbf3329_1920x622.jpeg 424w, https://substackcdn.com/image/fetch/$s_!fdNi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F910ab22a-20db-4541-b35d-734a0dbf3329_1920x622.jpeg 848w, https://substackcdn.com/image/fetch/$s_!fdNi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F910ab22a-20db-4541-b35d-734a0dbf3329_1920x622.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!fdNi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F910ab22a-20db-4541-b35d-734a0dbf3329_1920x622.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>This is the twelfth video in Part 5 of the Performance-Aware Programming series. Please see the <a href="https://www.computerenhance.com/p/table-of-contents">Table of Contents</a> to quickly navigate through the rest of the course as it is updated, and <a href="https://github.com/cmuratori/computer_enhance">the code repository</a> for downloadable code listings.</em></p>
      <p>
          <a href="https://www.computerenhance.com/p/how-to-use-uopsinfo">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Q&A #85 (2026-05-26)]]></title><description><![CDATA[Answers to questions from the last Q&A thread.]]></description><link>https://www.computerenhance.com/p/q-and-a-85-2026-05-26</link><guid isPermaLink="false">https://www.computerenhance.com/p/q-and-a-85-2026-05-26</guid><pubDate>Wed, 27 May 2026 06:49:39 GMT</pubDate><enclosure url="https://substack-video.s3.amazonaws.com/video_upload/post/199260802/94d4665c-9e1e-4ba7-9b4c-b3919c3ae13e/transcoded-206587.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>In each Q&amp;A video, I answer questions from the comments on the previous Q&amp;A video, which can be from any part of the course.</em></p><p>The questions addressed in this video are:</p><ul><li><p><strong>[0:00:03]</strong> &#8220;I would like to learn/understand more about performance oriented architecture design as most of the performance resources I see are on pinhole omptimizations, do you have specific examples and general guidelines for that?&#8221;</p></li><li><p><strong>[0:05:10]</strong> &#8220;What are the architectural considerations to allow the compiler to SIMDify your code?&#8221;</p></li><li><p><strong>[0:08:13]</strong> &#8220;You once mentioned that heaps are slow in real life. Could you elaborate on why and which algorithms may be used instead in appropriate situations?&#8221;</p></li><li><p><strong>[0:11:09]</strong> &#8220;I saw some advice outside of this course which was to try multiple overlapped reads at once and bypassing the file cache with FILE_FLAG_NO_BUFFERING. Trying this with 32 overlapped reads and reading the whole file into one big buffer (no chunking), I&#8217;m now reading at 6 GB/s. This is in line with my SSD read speed (7GB/s allegedly), but not all the numbers are making sense to me. I have two questions:</p><p>How can bypassing the file cache and reading from SSD possibly be faster than reading from the file cache in memory? My main memory is 64GB so I imagine my whole file fits in the cache.</p><p>Regardless of overlapped reads or bypassing the file cache, if I read the whole file into one big buffer, shouldn&#8217;t I be paying the cost of page faults regardless? I&#8217;d expect page faults and cache inefficiency would limit me to the original 2.5GB/s.&#8221;</p></li><li><p><strong>[0:19:32]</strong> &#8220;With block interleaving, we are doing extra loads/stores, how much more expensive is that in terms of power consumption? does it make sense to do the in order interleaving on some kind of mobile device specifically for battery life or to help maintain higher boost clocks via lower temperatures?&#8221;</p></li><li><p><strong>[0:21:54]</strong> &#8220;At my job we use Unreal and I find myself writing very defensive code. There is so much &#8220;if object exists, do thing, else, log an error and return&#8221; across the code base, it&#8217;s starting to feel like a waste of time. Especially since in a large number of cases, an early return still leaves the game in an unplayable state. The problem isn&#8217;t fixed; only moved elsewhere.</p><p>Do you have any thoughts on when one should do this kind of granular error checking, when one should assert and when it&#8217;s okay to just let it crash?</p><p>Is there a better way or do I just have to keep adding all this noise to my code?&#8221;</p></li><li><p><strong>[0:25:40]</strong> &#8220;Is it fine to expose internals of a struct together with &#8220;helper functions&#8221; to calculate derived data? Should I not worry about exposing internals? Any pointers to where I can learn more about library design in C?&#8221;</p></li><li><p><strong>[0:33:31]</strong> &#8220;In one QA you said that JIT and compile time are not mutually exclusive, but why I never seen some software written in a &#8220;JIT-less&#8221; language and use a JIT library to fill the compile time gap performance?&#8221;</p></li><li><p><strong>[0:36:40]</strong> &#8220;Something that its still not clear to me is: how do I choose a RAM frequency when buying a PC?&#8221;</p></li><li><p><strong>[0:40:09]</strong> &#8220;You said somewhere that you are not a fan o End To End (E2E) tests, why is that?&#8221;</p></li><li><p><strong>[0:45:03]</strong> &#8220;I&#8217;m curious about what to do when a procedure has only one use in the code? How to conciliate this with semantic compression? Assuming that you can only compress something if it was written at least twice. Because sometimes, even though its used once, it makes easier to read.&#8221;</p></li><li><p><strong>[0:47:26]</strong> &#8220;I was wondering: is a relational DB also the 35 year mistake? Since its a statically typed hierarchy that models the domain. My gut feeling is that its not, but it fits very well into the description.&#8221;</p></li><li><p><strong>[0:49:32]</strong> &#8220;In this article the author cites some research papers about how the SOLID principles improve software: https://florian-kraemer.net/software-architecture/2025/02/24/Are-the-SOLID-Principles-problematic.html and related to that, the book <em>Agile software development: Principles, Patterns and Practices</em> by Bob Martin, also tries to give a measurement &#8216;framework&#8217; for SOLID. Any thoughts on this?&#8221;</p></li><li><p><strong>[0:53:53]</strong> &#8220;In OO code is common to use objects with private data and some methods and constructor to keep the internal data always in a valid state, but in a procedural way, like C, you can misuse a struct. How do you deal with this in C or in procedural way in general?&#8221;</p></li><li><p><strong>[0:59:14]</strong> &#8220;Do you think that is worth creating content like the &#8220;OOPs&#8221; talk, &#8220;Clean Code Horrible Performance&#8221;, &#8220;Where does bad software come from?&#8221; (SOLID) where it tries to persuade the viewer that the &#8220;best practice&#8221; he was taught is bad? Because even though you are very sharp in your reasoning, your content still get misinterpreted for several reasons. I was thinking of doing something similar myself, grouping everything I learned from the &#8220;Handmade way&#8221; to present a better way to program, but I&#8217;m not sure if its worth it and I don&#8217;t have decades of experience so I&#8217;m afraid my argumentation would not be good enough.&#8221;</p></li><li><p><strong>[1:05:04]</strong> &#8220;Related to the question above, how do I convince my coworkers that they are doing bad practices?&#8221;</p></li><li><p><strong>[1:06:35]</strong> &#8220;How exactly do I apply WARMED at work?&#8221;</p></li><li><p><strong>[1:08:59]</strong> &#8220;In the end of your IMGUI talk, you said that was excited to see what would be improved in this approach in the next years. So, after 20 years, which improvements you saw?&#8221;</p></li><li><p><strong>[1:11:31]</strong> &#8220;Why compilers don&#8217;t have a #pragma DONT_REMOVE, #pragma END_DONT_REMOVE to make our lives easier instead of writing assembly code to do fake reads and writes for measuring performance for a block of code?&#8221;</p></li><li><p><strong>[1:13:13]</strong> &#8220;I&#8217;m curious on what a web dev handmade hero would look like, because in the game version is possible to ship a commercial game without using any libraries, however I don&#8217;t think that is possible with web, since you need some libraries to deal with security/encryption like HTTPS. So how would you approach this if you were to make a handmade hero for web? Like showing how to create a substack clone, but better :)&#8221;</p></li><li><p><strong>[1:14:59] </strong>&#8220;Knowing the specs of the Lion Cove P-Core of that CPU with 3x load units/ports it makes sense that there is no additional speedup after Read_x3. What puzzles me a bit/makes me curious though is the fact that Read_x3 is consistently the fastest. I tried it numerous times, locked the program to the same single P-core, built dedicated executables to make sure it has nothing to do with some code alignment etc. but Read_x3 is always the fastest with some ~2% edge over Read_x4. Of course 2% is not a huge difference, but I would have assumed that Read_x3 and Read_x4 would converge more or less or if at all that Read_x4 would be slightly faster due to less loop overhead in total.&#8221;</p></li><li><p><strong>[1:19:22]</strong> &#8220;Now that you&#8217;re mainly on Linux, have you been using any of the profiling tools that Linux provides in &lt;linux/perf_event.h&gt; and if you do, what are your thoughts on them?&#8221; / &#8220;In a recent standup I heard how you are now fully switched to Linux and never going back to Windows. I would like to know, what are you now then using for debugging on Linux? As raddbg is not yet ready, though getting closer&#8230;&#8221;</p></li></ul>
      <p>
          <a href="https://www.computerenhance.com/p/q-and-a-85-2026-05-26">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Block Interleaving]]></title><description><![CDATA[Breaking up dependency chains to better suit the processor's out-of-order scheduling gets most of the benefit of in-order interleaving without requiring a fully interleaved instruction stream.]]></description><link>https://www.computerenhance.com/p/block-interleaving</link><guid isPermaLink="false">https://www.computerenhance.com/p/block-interleaving</guid><pubDate>Tue, 28 Apr 2026 22:45:56 GMT</pubDate><enclosure url="https://substack-video.s3.amazonaws.com/video_upload/post/195810514/9101c701-4c74-4b8f-a18d-de21b13ce504/transcoded-44129.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fEGb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F704fff5b-a28e-4f85-9ea7-c36f478d496f_1920x826.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fEGb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F704fff5b-a28e-4f85-9ea7-c36f478d496f_1920x826.jpeg 424w, https://substackcdn.com/image/fetch/$s_!fEGb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F704fff5b-a28e-4f85-9ea7-c36f478d496f_1920x826.jpeg 848w, https://substackcdn.com/image/fetch/$s_!fEGb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F704fff5b-a28e-4f85-9ea7-c36f478d496f_1920x826.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!fEGb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F704fff5b-a28e-4f85-9ea7-c36f478d496f_1920x826.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fEGb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F704fff5b-a28e-4f85-9ea7-c36f478d496f_1920x826.jpeg" width="1456" height="626" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/704fff5b-a28e-4f85-9ea7-c36f478d496f_1920x826.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:626,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:625880,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.computerenhance.com/i/195810514?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F704fff5b-a28e-4f85-9ea7-c36f478d496f_1920x826.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fEGb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F704fff5b-a28e-4f85-9ea7-c36f478d496f_1920x826.jpeg 424w, https://substackcdn.com/image/fetch/$s_!fEGb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F704fff5b-a28e-4f85-9ea7-c36f478d496f_1920x826.jpeg 848w, https://substackcdn.com/image/fetch/$s_!fEGb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F704fff5b-a28e-4f85-9ea7-c36f478d496f_1920x826.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!fEGb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F704fff5b-a28e-4f85-9ea7-c36f478d496f_1920x826.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>This is the eleventh video in Part 5 of the Performance-Aware Programming series. Please see the <a href="https://www.computerenhance.com/p/table-of-contents">Table of Contents</a> to quickly navigate through the rest of the course as it is updated, and <a href="https://github.com/cmuratori/computer_enhance">the code repository</a> for downloadable code listings.</em></p>
      <p>
          <a href="https://www.computerenhance.com/p/block-interleaving">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Q&A #84 (2026-04-20)]]></title><description><![CDATA[Answers to questions from the last Q&A thread.]]></description><link>https://www.computerenhance.com/p/q-and-a-84-2026-04-20</link><guid isPermaLink="false">https://www.computerenhance.com/p/q-and-a-84-2026-04-20</guid><pubDate>Tue, 21 Apr 2026 02:46:08 GMT</pubDate><enclosure url="https://substack-video.s3.amazonaws.com/video_upload/post/194840638/9e584af3-4ea6-4c55-ba8d-3951879bbcc0/transcoded-61489.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>In each Q&amp;A video, I answer questions from the comments on the previous Q&amp;A video, which can be from any part of the course.</em></p><p>The questions addressed in this video are:</p><ul><li><p><strong>[00:04]</strong> &#8220;Quick question (for Casey or anyone with a suggestion!): I am neither a novice nor an experienced programmer, and would like to learn enough C++ to write some useful code, but I am struggling to find good material.&#8221;</p></li><li><p><strong>[02:18]</strong> &#8220;In the episode &#8216;In-order Interleaving&#8217;, you have a loop that handles multiple elements per loop iteration. When the list of elements isn&#8217;t a multiple of your &#8216;elements per loop&#8217;, we need something after the loop to handle the last few remaining elements. My question is: Is there a particular way to write this &#8216;residual handling&#8217; part? Whenever I&#8217;ve had to do this in my own code it always felt a little awkward.&#8221;</p></li><li><p><strong>[13:25]</strong> &#8220;Hey Casey, what&#8217;s your take on Arm making their own chips now? Sincerely, an Arm engineer.&#8221;</p></li><li><p><strong>[17:05]</strong> &#8220;On the topic of growable arenas, concretely, how do you implement them with respect to the &#8216;layered&#8217; architecture? To grow them, naturally you would need to mmap or VirtualAlloc more memory, but if you&#8217;re far removed from the platform layer and aren&#8217;t writing a program that is frame-based (for example, a compiler) and thus have no opportune time to go back to the platform layer and request more memory, what are your strategies? I can only come up with round-tripping back to the OS using a platform-layer API like &#8216;RequestMoreMemory()&#8217;, or something like this. For more context, this would be for the kind of program that also cannot place upper-bounds on memory usage (again, like in a compiler), but still does not want to be malloc&#8217;ing/free&#8217;ing excessively. I&#8217;m keen to hear how you would approach this problem?&#8221;</p></li><li><p><strong>[26:14]</strong> &#8220;Have you had occasion to use coroutines to write algorithms that must pause/resume? E.g., protocols in networking or software in a hardware simulator.&#8221;</p></li></ul>
      <p>
          <a href="https://www.computerenhance.com/p/q-and-a-84-2026-04-20">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[In-order Interleaving]]></title><description><![CDATA[By handing the CPU an instruction stream it can execute in order, we can exceed the limits we hit when we rely on its out-of-order execution capabilities.]]></description><link>https://www.computerenhance.com/p/in-order-interleaving</link><guid isPermaLink="false">https://www.computerenhance.com/p/in-order-interleaving</guid><pubDate>Thu, 16 Apr 2026 03:09:52 GMT</pubDate><enclosure url="https://substack-video.s3.amazonaws.com/video_upload/post/194365258/709f78f7-1627-4ee5-bf05-c53a76e185a0/transcoded-47060.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!CKM1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc63b4f35-7718-4927-85e6-2f53f12fff82_5634x2864.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!CKM1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc63b4f35-7718-4927-85e6-2f53f12fff82_5634x2864.jpeg 424w, https://substackcdn.com/image/fetch/$s_!CKM1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc63b4f35-7718-4927-85e6-2f53f12fff82_5634x2864.jpeg 848w, https://substackcdn.com/image/fetch/$s_!CKM1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc63b4f35-7718-4927-85e6-2f53f12fff82_5634x2864.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!CKM1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc63b4f35-7718-4927-85e6-2f53f12fff82_5634x2864.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!CKM1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc63b4f35-7718-4927-85e6-2f53f12fff82_5634x2864.jpeg" width="1456" height="740" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c63b4f35-7718-4927-85e6-2f53f12fff82_5634x2864.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:740,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2362184,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.computerenhance.com/i/194365258?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc63b4f35-7718-4927-85e6-2f53f12fff82_5634x2864.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!CKM1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc63b4f35-7718-4927-85e6-2f53f12fff82_5634x2864.jpeg 424w, https://substackcdn.com/image/fetch/$s_!CKM1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc63b4f35-7718-4927-85e6-2f53f12fff82_5634x2864.jpeg 848w, https://substackcdn.com/image/fetch/$s_!CKM1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc63b4f35-7718-4927-85e6-2f53f12fff82_5634x2864.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!CKM1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc63b4f35-7718-4927-85e6-2f53f12fff82_5634x2864.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>This is the tenth video in Part 5 of the Performance-Aware Programming series. Please see the <a href="https://www.computerenhance.com/p/table-of-contents">Table of Contents</a> to quickly navigate through the rest of the course as it is updated, and <a href="https://github.com/cmuratori/computer_enhance">the code repository</a> for downloadable code listings.</em></p>
      <p>
          <a href="https://www.computerenhance.com/p/in-order-interleaving">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Q&A #83 (2026-03-11)]]></title><description><![CDATA[Answers to questions from the last Q&A thread.]]></description><link>https://www.computerenhance.com/p/q-and-a-83-2026-03-11</link><guid isPermaLink="false">https://www.computerenhance.com/p/q-and-a-83-2026-03-11</guid><pubDate>Thu, 12 Mar 2026 04:19:57 GMT</pubDate><enclosure url="https://substack-video.s3.amazonaws.com/video_upload/post/190667397/dd549d8f-4ef2-4367-8a4f-573842929d35/transcoded-194421.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>In each Q&amp;A video, I answer questions from the comments on the previous Q&amp;A video, which can be from any part of the course.</em></p><p>The questions addressed in this video are:</p><ul><li><p><strong>[0:00:03]</strong> &#8220;Building on using free lists to manage objects with complex lifetimes, how do you handle cases where an object type can vary in size?&#8221;</p></li><li><p><strong>[0:02:33]</strong> &#8220;Can you go into more detail about how to handle IO with queues vs callbacks? A small example on the fancy light-board would be nice.&#8221;</p></li><li><p><strong>[0:07:14]</strong> &#8220;With the recent widespread usage of AI to program in the industry and a fierce push from upper management to enforce usage of AI in development tasks, I am finding myself less and less motivated to program, and also depressed. I have mostly ceased my programming activities outside of working hours.</p><p>Do you have any advice on how to deal with the AI wave and also to continue motivated to program as a hobby and professionally in this environment?&#8221;</p></li><li><p><strong>[0:20:00]</strong> &#8220;Now that you are switching to Linux in full swing have you ported your codebases or what&#8217;s your strategy? I did the opposite thing and fled from Linux because I wanted to start my first big codebase in a more stable environment so it has been the year of the Windows desktop for me which after running a debloat script hasn&#8217;t actually been bad, nevertheless seeing Microsoft&#8217;s behavior does have me yearning for the penguin, and knowing how painful releasing something for Linux is has me wondering if programming against the windows API and using Wine for most of one&#8217;s graphical software is a valid way to proceed.&#8221;</p></li><li><p><strong>[0:29:50]</strong> &#8220;How much of the software you currently write is based on your own thoughts and creativity? I&#8217;ve found myself relying heavily on reading and using open-source projects where similar problems were already solved. After a while, this started to make me feel like an impostor. Do great programmers like yourself write their own code, or do they adapt and build on existing work?&#8221;</p></li><li><p><strong>[0:35:38]</strong> &#8220;This maybe jumping the gun a fair bit. but when you say back of the envelop calculation, does that include the time complexity of whatever algorithm to do correctly?&#8221;</p></li><li><p><strong>[0:39:45]</strong> &#8220;Have you ever written anything in ISPC? From my experience it seems much easier to write than rolling the simd myself, and much more portable.&#8221;</p></li><li><p><strong>[0:42:10]</strong> &#8220;Do you have any recommendations on how to use &#8216;pen &amp; paper&#8217; in the software development process? I&#8217;ve noticed that drawing a problem helps me visualize the problem easier than jumping straight into coding. I&#8217;m working with MIDI and writing a document explaining in my own words how does MIDI encode data helped me in unexpected ways. I&#8217;m wondering if I&#8217;m missing out on any other &#8216;pen &amp; paper&#8217; practice. Any useful tips?&#8221;</p></li><li><p><strong>[0:46:47]</strong> &#8220;Hello I come with another question. How does one go about implementing smooth window resizing with Dangerous Thread Crews? I have the two threads and do get non-blocking resizing but the UI contents of the window jiggle and stretch a bit when resizing (which gets way worse with vsync) and nothing I try (like adding synchronization and using WM_WINDOWPOSCHANGING to allow only one resize per frame) gets rid of the apparent discrepancy between the window size and the size my renderer targets that causes some ugliness. The only reference I have of a program that has smooth resizing and doesn&#8217;t achieve this by updating on each WM_PAINT is your refterm program but I don&#8217;t see any glaring differences between the way you rendered stuff there and it seems like there you just called GetClientRect on each update with no need for anything else... so am I just doing something dumb and missing something or is there some computer wizardry I need to invoke to fix this when programming more complex UIs?&#8221;</p></li><li><p><strong>[0:51:06]</strong> &#8220;In this talk about pathfinding in Age of Empires 2, around the 15-16 minute mark, he mentions turning off SIMD because they were losing floating point precision. I don&#8217;t understand why that was the case for them, can you provide more insight?&#8221;</p></li><li><p><strong>[1:01:33]</strong> &#8220;I&#8217;ve been trying to test how many simultaneous loads can I get with the repetition tester when I do 1, 2, 3 or 4 moves in a loop on the two machines I have access to. My laptop (which has Intel Skylake chip) reports that the memory bandwidth doubles only when I go from 1 to 2 moves per loop, which is expected and reproduces your results from the course. But when I do the same on my desktop (which has Intel Raptor lake chip), the results are the same. The Raptor lake apparently have two type of cores: P-cores and E-cores, where E-cores also have 2 ports capable of executing loads, while P-cores are supposed to be equipped with 3 ports of that type (at least that&#8217;s what I read in Agner Fog manual). To my understanding it means that I should see a bandwidth bump when going from 1 to 2 moves per loop, and when going from 2 to 3 moves per loop. But that doesn&#8217;t happen, I see only one bump (from 1 to 2 moves). I guess that there are some nuances with running the tester on this system that I might not be aware of. But one of them - which is clear to me - is that it is the OS who decides on which core should the tester be run on. So I set the affinity of the tester to CPU1 to make sure that it runs on the P-core. Process Explorer confirmed that it runs on CPU1. But I still could not see the improvement when going from 2 to 3 memory reads. Then I repeated the test with all the cores (one at a time), but I saw no difference in the results.</p><p>It is either my test setup that&#8217;s completely broken, or some other factor that I can&#8217;t see which prevents the bandwidth improvement of a 3 reads loop. I would be grateful if you could give me some pointers here.&#8221;</p></li><li><p><strong>[1:07:25]</strong> &#8220;What are your thoughts about the new dynamicdeopt in msvc that lets you run optimized builds that you can debug with full information because they switch the executable on the fly?&#8221;</p></li><li><p><strong>[1:08:15]</strong> &#8220;I&#8217;d like to ask you about the pass-by-ref vs pass-by-value &#8216;debate&#8217;. Traditional C++ advice is &#8216;always pass by const&amp; anything bigger than 8 bytes&#8217;, but I&#8217;ve recently started seeing some people advocate that 16 byte structs should also be pass-by-value. I know that you couldn&#8217;t care less about const, and you do seem to pass by value small stuff without worrying too much about it in your own code, so .. is there anything I&#8217;m failing to consider here? is this a stupid thing to worry about in the abstract, or is there some general principle that could be useful to keep in mind here?&#8221;</p></li></ul>
      <p>
          <a href="https://www.computerenhance.com/p/q-and-a-83-2026-03-11">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Dependency Chain Stalls]]></title><description><![CDATA[The CPU's ability to extract parallelism has its limits.]]></description><link>https://www.computerenhance.com/p/dependency-chain-stalls</link><guid isPermaLink="false">https://www.computerenhance.com/p/dependency-chain-stalls</guid><pubDate>Wed, 04 Mar 2026 03:31:39 GMT</pubDate><enclosure url="https://substack-video.s3.amazonaws.com/video_upload/post/189840675/0e97fb25-8257-4e41-9a4f-67b0d50eadf4/transcoded-77573.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!AcnN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee94ce40-8301-48bb-be91-4060d4336723_1920x826.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!AcnN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee94ce40-8301-48bb-be91-4060d4336723_1920x826.jpeg 424w, https://substackcdn.com/image/fetch/$s_!AcnN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee94ce40-8301-48bb-be91-4060d4336723_1920x826.jpeg 848w, https://substackcdn.com/image/fetch/$s_!AcnN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee94ce40-8301-48bb-be91-4060d4336723_1920x826.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!AcnN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee94ce40-8301-48bb-be91-4060d4336723_1920x826.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!AcnN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee94ce40-8301-48bb-be91-4060d4336723_1920x826.jpeg" width="1456" height="626" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ee94ce40-8301-48bb-be91-4060d4336723_1920x826.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:626,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:625880,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.computerenhance.com/i/189840675?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee94ce40-8301-48bb-be91-4060d4336723_1920x826.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!AcnN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee94ce40-8301-48bb-be91-4060d4336723_1920x826.jpeg 424w, https://substackcdn.com/image/fetch/$s_!AcnN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee94ce40-8301-48bb-be91-4060d4336723_1920x826.jpeg 848w, https://substackcdn.com/image/fetch/$s_!AcnN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee94ce40-8301-48bb-be91-4060d4336723_1920x826.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!AcnN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee94ce40-8301-48bb-be91-4060d4336723_1920x826.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>This is the ninth video in Part 5 of the Performance-Aware Programming series. Please see the <a href="https://www.computerenhance.com/p/table-of-contents">Table of Contents</a> to quickly navigate through the rest of the course as it is updated, and <a href="https://github.com/cmuratori/computer_enhance">the code repository</a> for downloadable code listings.</em></p>
      <p>
          <a href="https://www.computerenhance.com/p/dependency-chain-stalls">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Q&A #82 (2026-01-27)]]></title><description><![CDATA[Answers to questions from the last Q&A thread.]]></description><link>https://www.computerenhance.com/p/q-and-a-82-2026-01-27</link><guid isPermaLink="false">https://www.computerenhance.com/p/q-and-a-82-2026-01-27</guid><pubDate>Tue, 27 Jan 2026 23:58:15 GMT</pubDate><enclosure url="https://substack-video.s3.amazonaws.com/video_upload/post/186010676/f8099312-2207-4fc3-a203-77f3cc4aaad6/transcoded-03188.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>In each Q&amp;A video, I answer questions from the comments on the previous Q&amp;A video, which can be from any part of the course.</em></p><p>The questions addressed in this video are:</p><ul><li><p><strong>[01:02]</strong> &#8220;Could you suggest a resource to learn how to work with large datasets that don&#8217;t fit into VRAM, or even regular RAM?&#8221;</p></li><li><p><strong>[07:24]</strong> &#8220;I do not know how this compares to other approaches, as I&#8217;ve not really tried to optimize my profilers much as it&#8217;s not really my main area of work, but your profiling explanation made me think of a lock-debug profiler I built. It sounds similar to me at least. Do you agree?</p><p>In my solution, each worker thread owns its profiling buffers completely. I keep a thread-local pointer to a block-based buffer that grows in chunks when needed, and all writes are done by the thread with no locks. The only shared step is that when a thread creates its first block, it registers it once by pushing it into a global linked list so the UI can later iterate all threads. To avoid stalling the writers, I double-buffered it. Each thread allocates two buffers and writes into one of them.&#8221;</p></li><li><p><strong>[09:46]</strong> &#8220;About memory usage and the program stack, I believed that this stack could more easily fit in the L1 cache. Isn&#8217;t there a higher risk of cache misses at that level if the data to use has been allocated elsewhere? Would it even be noticeable in terms of performance?&#8221;</p></li><li><p><strong>[14:32]</strong> &#8220;What&#8217;s the best way to multithread a software rasterizer? In my experience, scan line interlacing per triangle was horrific.&#8221;</p></li><li><p><strong>[18:18]</strong> &#8220;Do you know why in the file processing test on the same machine on linux mmaping the file could be outperforming all other methods on medium and large mapped chunk sizes, and on windows the same test was worse than everyone else?&#8221;</p></li><li><p><strong>[21:18]</strong> &#8220;Regarding callbacks, I think I understood the argument about moving things to a queue instead. However, I also can see the benefit of a callback because that&#8217;s synchronous. What if the file IO read some data and stored in a buffer, and the callback now is supposed to handle that read data. If instead you move things to a queue, you&#8217;d have to keep allocating new buffers to keep working on those asynchronous reads, because you don&#8217;t know when the client code will handle those reads. If instead you can very quickly handle this buffer of read data synchronously in the callback, the file IO can reuse the same buffer to store the next chunk.&#8221;</p></li><li><p><strong>[25:45]</strong> &#8220;Can you steelman cases for which an arena/bump allocator (are these the same thing?) is not the preferred way to allocate memory (I imagine it is when lifetimes are not apriori known, but perhaps I am missing more subtlety)? In such cases, what is your preferred method of allocation? Are you forced to go back to new/delete?&#8221;</p></li><li><p><strong>[33:01]</strong> &#8220;Re: Dead Code Elimination Prevention Macros, I&#8217;ve got identical results with `asm volatile (&#8221;&#8220; : &#8220;+v&#8221;(Value));` which tells the compiler that `Value` is both input and output, forcing it to initialize it, as well as preventing from assuming a specific value. It would generate `vpxor` for `0.0` and `vmovaps` for `0.5`. The advantage here is that it&#8217;s quite generic and doe<code>n&#8217;t depend on operand size/type.</code>Could you please elaborate on your choice of explicitly using instructions?&#8221;</p></li><li><p><strong>[35:14]</strong> &#8220;Concerning callbacks, is there a benefit to use callbacks for print-outs/messaging? We have a part of the program that do some calculations which can take time, and it uses callbacks to notify the user about the progress, and waiting to the end of all the computations is not good enough. We also use a callback for a type of mesh calculation that depends on things that this other part of the program shouldn&#8217;t know about. (These things were not my decision, but I guess making the module as isolated as possible makes it easier to use it as a module in another program)&#8221;</p></li><li><p><strong>[37:42]</strong> &#8220;How do you determine if a solution to a problem is more complex than it needs to be? And for inherently complex and interconnected problems, how do you determine if it needs to be subdivided or not? Is there a general approach for working on a complex system with lots of moving parts? (other than me complaining about it :) )&#8221;</p></li></ul>
      <p>
          <a href="https://www.computerenhance.com/p/q-and-a-82-2026-01-27">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Dead Code Elimination Prevention Macros]]></title><description><![CDATA[Watch now (26 mins) | This is the eighth video in Part 5 of the Performance-Aware Programming series.]]></description><link>https://www.computerenhance.com/p/dead-code-elimination-prevention</link><guid isPermaLink="false">https://www.computerenhance.com/p/dead-code-elimination-prevention</guid><pubDate>Mon, 29 Dec 2025 17:03:03 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!s3T5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadaddecc-49f3-4dd3-98d5-816573e1352d_1920x754.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!s3T5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadaddecc-49f3-4dd3-98d5-816573e1352d_1920x754.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!s3T5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadaddecc-49f3-4dd3-98d5-816573e1352d_1920x754.jpeg 424w, https://substackcdn.com/image/fetch/$s_!s3T5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadaddecc-49f3-4dd3-98d5-816573e1352d_1920x754.jpeg 848w, https://substackcdn.com/image/fetch/$s_!s3T5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadaddecc-49f3-4dd3-98d5-816573e1352d_1920x754.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!s3T5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadaddecc-49f3-4dd3-98d5-816573e1352d_1920x754.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!s3T5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadaddecc-49f3-4dd3-98d5-816573e1352d_1920x754.jpeg" width="1456" height="572" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/adaddecc-49f3-4dd3-98d5-816573e1352d_1920x754.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:572,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:989517,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.computerenhance.com/i/182383215?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadaddecc-49f3-4dd3-98d5-816573e1352d_1920x754.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!s3T5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadaddecc-49f3-4dd3-98d5-816573e1352d_1920x754.jpeg 424w, https://substackcdn.com/image/fetch/$s_!s3T5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadaddecc-49f3-4dd3-98d5-816573e1352d_1920x754.jpeg 848w, https://substackcdn.com/image/fetch/$s_!s3T5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadaddecc-49f3-4dd3-98d5-816573e1352d_1920x754.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!s3T5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadaddecc-49f3-4dd3-98d5-816573e1352d_1920x754.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>This is the eighth video in Part 5 of the Performance-Aware Programming series. Please see the <a href="https://www.computerenhance.com/p/table-of-contents">Table of Contents</a> to quickly navigate through the rest of the course as it is updated, and <a href="https://github.com/cmuratori/computer_enhance">the code repository</a> for downloadable code listings.</em></p>
      <p>
          <a href="https://www.computerenhance.com/p/dead-code-elimination-prevention">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Q&A #81 (2025-12-22)]]></title><description><![CDATA[Answers to questions from the last Q&A thread.]]></description><link>https://www.computerenhance.com/p/q-and-a-81-2025-12-22</link><guid isPermaLink="false">https://www.computerenhance.com/p/q-and-a-81-2025-12-22</guid><pubDate>Tue, 23 Dec 2025 00:10:14 GMT</pubDate><enclosure url="https://substack-video.s3.amazonaws.com/video_upload/post/182146727/3e1b2055-4dc9-4e9e-b1a5-55add88adf7f/transcoded-119548.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>In each Q&amp;A video, I answer questions from the comments on the previous Q&amp;A video, which can be from any part of the course.</em></p><p>The questions addressed in this video are:</p><ul><li><p><strong>[00:04]</strong> &#8220;What are your top 5 recommendations for must have projects or applications to build at least once in your life to make you a better programmer ?&#8221;</p></li><li><p><strong>[14:32]</strong> &#8220;My question is regarding your previous comment on stack below, where you mentioned that we don't want to do a lot with the stack for performance oriented code. Before this course, I had impression that I should prefer stack allocation to heap allocation (maybe just C++) because I read/watched somewhere saying that 1) stack allocation/dealloc is faster (just move the stack pointer) than heap allocation 2) stack object has more obvious lifetime, 3) stack has no fragmentation concern which improves locality 4) you call on stack object directly while heap object involves indirection.</p><p>I wonder whether my understanding above is correct or not. If not, does that mean I can prefer using heap over stack allocation most of the time? Could you please further comment? Thanks!&#8221;</p></li><li><p><strong>[32:48]</strong> &#8220;One idea is to avoid dynamic allocation in the critical path by pushing each measurement into a dedicated logging/profiler thread through a channel. The worker thread would record measurements while the computation threads only perform the minimal &#8216;send&#8217; operation. But atomic operations, queues, and cross-thread communication can also add overhead, but also distort the original program execution &#8230; I&#8217;d like advice on how to judge whether this dedicated-thread idea is sound, and in general how to think about designing a low-overhead profiler.&#8221;</p></li><li><p><strong>[44:20]</strong> &#8220;What is your opinion on using callbacks for say signal handling? They often seem necessary but I may be overusing them, which leads me to believe my overall architecture could be flawed. But generally speaking, what&#8217;s your view on callbacks? Love them or hate them? Do you try to avoid them?&#8221;</p></li></ul>
      <p>
          <a href="https://www.computerenhance.com/p/q-and-a-81-2025-12-22">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Our Nemesis Returns]]></title><description><![CDATA[To avoid spoiling the surprise for people who have not yet done the homework, I cannot be any more specific in the title.]]></description><link>https://www.computerenhance.com/p/our-nemesis-returns</link><guid isPermaLink="false">https://www.computerenhance.com/p/our-nemesis-returns</guid><pubDate>Mon, 03 Nov 2025 23:19:53 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!yaoM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd957f3c7-f9b3-44ab-85cf-52e2fda1e1ff_1920x1089.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yaoM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd957f3c7-f9b3-44ab-85cf-52e2fda1e1ff_1920x1089.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yaoM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd957f3c7-f9b3-44ab-85cf-52e2fda1e1ff_1920x1089.jpeg 424w, https://substackcdn.com/image/fetch/$s_!yaoM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd957f3c7-f9b3-44ab-85cf-52e2fda1e1ff_1920x1089.jpeg 848w, https://substackcdn.com/image/fetch/$s_!yaoM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd957f3c7-f9b3-44ab-85cf-52e2fda1e1ff_1920x1089.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!yaoM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd957f3c7-f9b3-44ab-85cf-52e2fda1e1ff_1920x1089.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yaoM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd957f3c7-f9b3-44ab-85cf-52e2fda1e1ff_1920x1089.jpeg" width="1456" height="826" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d957f3c7-f9b3-44ab-85cf-52e2fda1e1ff_1920x1089.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:826,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:420478,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.computerenhance.com/i/177937774?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd957f3c7-f9b3-44ab-85cf-52e2fda1e1ff_1920x1089.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!yaoM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd957f3c7-f9b3-44ab-85cf-52e2fda1e1ff_1920x1089.jpeg 424w, https://substackcdn.com/image/fetch/$s_!yaoM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd957f3c7-f9b3-44ab-85cf-52e2fda1e1ff_1920x1089.jpeg 848w, https://substackcdn.com/image/fetch/$s_!yaoM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd957f3c7-f9b3-44ab-85cf-52e2fda1e1ff_1920x1089.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!yaoM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd957f3c7-f9b3-44ab-85cf-52e2fda1e1ff_1920x1089.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>This is the seventh video in Part 5 of the Performance-Aware Programming series. Please see the <a href="https://www.computerenhance.com/p/table-of-contents">Table of Contents</a> to quickly navigate through the rest of the course as it is updated, and <a href="https://github.com/cmuratori/computer_enhance">the code repository</a> for downloadable code listings.</em></p>
      <p>
          <a href="https://www.computerenhance.com/p/our-nemesis-returns">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Q&A #80 (2025-10-31)]]></title><description><![CDATA[Answers to questions from the last Q&A thread.]]></description><link>https://www.computerenhance.com/p/q-and-a-80-2025-10-31</link><guid isPermaLink="false">https://www.computerenhance.com/p/q-and-a-80-2025-10-31</guid><pubDate>Sat, 01 Nov 2025 03:18:21 GMT</pubDate><enclosure url="https://substack-video.s3.amazonaws.com/video_upload/post/177680510/727f848b-cbfd-4080-bd9e-2556b1252cc8/transcoded-62883.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>In each Q&amp;A video, I answer questions from the comments on the previous Q&amp;A video, which can be from any part of the course.</em></p><p>The questions addressed in this video are:</p><ul><li><p><strong>[00:05]</strong> &#8220;Question regarding the &#8216;fat struct&#8217; approach: do you ever find yourself thinking about excess memory consumption caused by some entities having unused fields?&#8221;</p></li><li><p><strong>[04:29]</strong> &#8220;About fat structs, you said we would initialize the struct to be able to be A or B depending on the situation, if you imagine that fat struct to be of type Result with cases Success or Error, how would you be able to initialize the Error in the case you have a Success as there was no Error, and vice-versa. This type for example exists in F# and it makes sense to initialize Success only or Error only.&#8221;</p></li><li><p><strong>[07:40]</strong> &#8220;Do you have any suggestions for an off the shelf tool to measure bandwidth and flops for a machine your run it on?&#8221;</p></li><li><p><strong>[13:10]</strong> &#8220;I sense a recurring theme of reliability and predictability &#8211; preferring simple control flow to early returns, preferring simple compilers to strict aliasing, preferring large blocks of memory to pointer festivals, etc. You&#8217;ve also spoken about improving _robustness_ by preferring zero/dummy values that just flow through code to null pointers, and preferring handles to pointers (from the iCloud example we just saw).</p><p>What do you mean with &#8220;robustness&#8221;, and what other techniques can I use to make my code more robust in this way?&#8221;</p></li><li><p><strong>[18:00]</strong> &#8220;Do you have any thoughts on Apple&#8217;s approach to SIMD?&#8221;</p></li><li><p><strong>[20:10]</strong> &#8220;Hi Casey, on the recent podcast with Marco you said that if you could choose a single piece of software to be magically redesigned it would definitely be the browser because the software platform it defines is bad. Could you please elaborate on that? What are the main problems with today&#8217;s browsers from your perspective?&#8221;</p></li><li><p><strong>[22:30]</strong> &#8220;I&#8217;m a bit confused when analyzing the bandwidth I get when reading directly from the volume (using ReadFile with the path &#8216;\.\C:&#8217; with an offset, for example). As far as I know, that&#8217;s the correct way to read system files like the $MFT (which I can currently read properly, by the way).</p><p>When reading 20 GB of contiguous data from the beginning of the volume, I get 2.6 GB/s, and it doesn&#8217;t matter whether I use the FILE_FLAG_NO_BUFFERING flag or not&#8212;the result is the same. I&#8217;d expect something closer to a non-cached read (4.9 GB/s), but I&#8217;m getting the same throughput as a cold cached read. I&#8217;m not sure where this penalty comes from (assuming the read isn&#8217;t triggering extra cache operations since it&#8217;s non-buffered).</p><p>Any idea what might be going on here? Do you think these read bandwidths make sense?&#8221;</p></li><li><p><strong>[25:46]</strong> &#8220;Will we be able to reuse the coefficients we&#8217;re currently using for f64 sine for f32 sine or will we need new ones?&#8221;</p></li><li><p><strong>[28:24]</strong> &#8220;I have a question about PC hardware components. I&#8217;m not clear on things like the motherboard and chipset. Do they play any role in performance? They vary a lot in price even with similar features, so I assume some aspects must affect overall system performance. Could you give a brief overview of these system parts, if possible?</p><p>Or, to rephrase my question: When you&#8217;re building a PC, what do you specifically look at besides just the CPU, RAM, disk and GPU in terms of performance? How do you decide what&#8217;s suitable for your specific builds?&#8221;</p></li><li><p><strong>[36:09]</strong> &#8220;In my code, I ended up having a single loop over the input that directly produced the haversine sum, rather than splitting parsing and math into two loops. But that means if I want to time parsing vs math, I have to put blocks into the loop, which (seemingly) inevitably introduces a lot of overhead.</p><p>Is there a good way to handle this? The best way I can think of is to instead temporarily comment out parts and just time the rest, though while that&#8217;s easy to do with the math part, it seems harder to do for the parsing part, since you still have to somehow produce dummy data for the math while making sure this doesn&#8217;t lead to any compiler optimizations you wouldn&#8217;t otherwise get.&#8221;</p></li><li><p><strong>[39:28]</strong> &#8220;Sorry for repeating the question from the last Q&amp;A, but here is: I have just gotten to it, did it and looked up your solution in QA47 to cross reference. There is one thing that we got differently, and I don&#8217;t quite understand your reasoning about it:</p><p>You said that shl rbx, 0 should be recognized by the frontend as a nop and not do anything with flags, but would produce rbx.</p><p>1) If the frontend sees it as a nop, why would it RAW the value of rbx, and not just be a pure nop?</p><p>2) I actually thought that it would not be recognized as a nop (I didn&#8217;t find anything about this kind of optimization, i presumed it would be somewhere near zero idiom stuff in the manual), and then it seems like shl will have not only a RAW on rbx, but also on all the flags, as it has be ready that the ALU will say that the shift was 0 and the previous value of flags should be preserved (i. e. RAW)</p><p>So the question is, why is it that rbx is a RAW and flags are skipped, and do you know if there is any place in the docs where such a frontend optimization might be mentioned?&#8221;</p></li></ul>
      <p>
          <a href="https://www.computerenhance.com/p/q-and-a-80-2025-10-31">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Better Prevention of Dead Code Elimination - Or Is It?]]></title><description><![CDATA[The most straightforward way of isolating a section of optimized code has a hidden gotcha.]]></description><link>https://www.computerenhance.com/p/better-prevention-of-dead-code-elimination</link><guid isPermaLink="false">https://www.computerenhance.com/p/better-prevention-of-dead-code-elimination</guid><pubDate>Wed, 22 Oct 2025 20:47:25 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!A3vD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F432ddd68-15ae-4a28-a4ab-dc40c4a96688_2912x1133.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!A3vD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F432ddd68-15ae-4a28-a4ab-dc40c4a96688_2912x1133.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!A3vD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F432ddd68-15ae-4a28-a4ab-dc40c4a96688_2912x1133.jpeg 424w, https://substackcdn.com/image/fetch/$s_!A3vD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F432ddd68-15ae-4a28-a4ab-dc40c4a96688_2912x1133.jpeg 848w, https://substackcdn.com/image/fetch/$s_!A3vD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F432ddd68-15ae-4a28-a4ab-dc40c4a96688_2912x1133.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!A3vD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F432ddd68-15ae-4a28-a4ab-dc40c4a96688_2912x1133.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!A3vD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F432ddd68-15ae-4a28-a4ab-dc40c4a96688_2912x1133.jpeg" width="1456" height="567" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/432ddd68-15ae-4a28-a4ab-dc40c4a96688_2912x1133.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:567,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:385791,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.computerenhance.com/i/176867812?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F432ddd68-15ae-4a28-a4ab-dc40c4a96688_2912x1133.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!A3vD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F432ddd68-15ae-4a28-a4ab-dc40c4a96688_2912x1133.jpeg 424w, https://substackcdn.com/image/fetch/$s_!A3vD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F432ddd68-15ae-4a28-a4ab-dc40c4a96688_2912x1133.jpeg 848w, https://substackcdn.com/image/fetch/$s_!A3vD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F432ddd68-15ae-4a28-a4ab-dc40c4a96688_2912x1133.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!A3vD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F432ddd68-15ae-4a28-a4ab-dc40c4a96688_2912x1133.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>This is the sixth video in Part 5 of the Performance-Aware Programming series. Please see the <a href="https://www.computerenhance.com/p/table-of-contents">Table of Contents</a> to quickly navigate through the rest of the course as it is updated, and <a href="https://github.com/cmuratori/computer_enhance">the code repository</a> for downloadable code listings.</em></p>
      <p>
          <a href="https://www.computerenhance.com/p/better-prevention-of-dead-code-elimination">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Q&A #79 (2025-09-28)]]></title><description><![CDATA[Answers to questions from the last Q&A thread.]]></description><link>https://www.computerenhance.com/p/q-and-a-79-2025-09-28</link><guid isPermaLink="false">https://www.computerenhance.com/p/q-and-a-79-2025-09-28</guid><pubDate>Mon, 29 Sep 2025 04:17:35 GMT</pubDate><enclosure url="https://substack-video.s3.amazonaws.com/video_upload/post/174646817/1658adc6-67d6-4d53-a840-0e2a6bff63ab/transcoded-368844.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>In each Q&amp;A video, I answer questions from the comments on the previous Q&amp;A video, which can be from any part of the course.</em></p><p>The questions addressed in this video are:</p><ul><li><p><strong>[00:00:09]</strong> &#8220;How far can we take SIMD use? What if we put a way bigger onus on eliminating branching and developing parallellizable code? The language K, the simdjson library, the Co-dfns compiler, and the new Box2D engine all blast competitors out of the water by thinking of data parallelism. SIMD use is arduous now - should we start designing our languages and APIs around it? What&#8217;s the path to broader cross pollination as an industry? It really seems an untapped potential, right?&#8221;</p></li><li><p><strong>[00:04:01]</strong> &#8220;Could you give examples of the kind of substrate work you&#8217;re hoping more people take seriously? Are you referring to teams like the WSL or Visual Studio performance teams, or something even deeper (or higher-level)? You&#8217;ve cited companies rewriting websites for performance and Microsoft&#8217;s console output claims - are these the kinds of things you mean? And if you mean something different, how do you think someone can get involved in that kind of work? I&#8217;m trying to understand the bigger vision. Is it that everything on the internet runs as fast as the McMaster-Carr Supply Store website?&#8221;</p></li><li><p><strong>[00:07:45]</strong> &#8220;When you critique tools like Git, are you expressing a personal frustration, or do you think there&#8217;s an objectively better way these tools could work? It&#8217;s hard for me to imagine a world where I don&#8217;t need to memorize Git or AWS minutiae - are there products today that represent what you think &#8220;just working&#8221; should look like?&#8221;</p></li><li><p><strong>[00:13:00]</strong> &#8220;What do you think about coroutines?&#8221;</p></li><li><p><strong>[00:15:50]</strong> &#8220;Will a VOD of the &#8216;Research Overview for &#8220;The Big OOPs&#8221;&#8217; livestream be made available later?&#8221;</p></li><li><p><strong>[00:16:10]</strong> &#8220;Do you find that this performance aware development you are talking about also improves the quality of the software in general? Is there any connection? Further, once you decide that something needs to have its own test or tests, what techniques(as in test code organization, handling of test data etc.) and tools you find useful in creating those tests?&#8221;</p></li><li><p><strong>[00:23:18]</strong> &#8220;I&#8217;m currently onboarding at my first ever job. It&#8217;s an giant legacy PHP codebase which primarily uses OOP. I don&#8217;t think OOP is great. But if you were forced to use classes for everything, what would be the best way to do it?&#8221;</p></li><li><p><strong>[00:26:40]</strong> &#8220;It&#8217;s interesting how I get much better performance on Zen3 on Linux, than reference haversine even for the replacement case. &#8221;</p></li><li><p><strong>[00:30:47]</strong> &#8220;I&#8217;ve been watching nearly all the BSC talks, and they&#8217;ve made me realize I might have a wrong understanding of what a type is. What would be the best definition?&#8221;</p></li><li><p><strong>[00:35:19]</strong> &#8220;It seems like there has been a push in recent years towards languages with stronger type systems and static analysis (such as Rust&#8217;s borrow checker). Do you think that this trend meaningfully improves software quality, and if so what static analysis tools (both existing and hypothetical) do you think would be the most beneficial for a performance-minded programmer?&#8221;</p></li><li><p><strong>[00:40:35]</strong> &#8220;I watched a YouTube video about the Montana mini-computer, and I understood how the concept of a function is implemented at the assembly level. I was wondering: what is a virtual function as defined in a high-level language, and how does that translate to a CPU? Along the same lines, I didn&#8217;t fully understand the concept of volatile. It seems to be related to the stack&#8212;could you explain how a volatile variable is represented at the assembly/CPU level?&#8221;</p></li><li><p><strong>[01:04:10]</strong> &#8220;With Intel AMX becoming more mature or widely-known about, do you think it is or will be possible to start doing things typically done on GPUs (texturing, filtering, convolving, etc) on CPUs in the future? Assuming it will be, do you think GPU vendors will finally start opening up a l&#225; the 30 million line problem in a fight to remain competitive with CPU vendors?&#8221;</p></li><li><p><strong>[01:13:00]</strong> &#8220;Unlike games, a lot of the value in the Apple world comes from tight integration with all of their ecosystem and design language (ex: iphone widgets, watch now-playing view, siri searchability, sharing to other apps or airdrop, accessibility integration, etc...) But of course Apple is has a lot of OOP style and &#8220;declarative frameworks,&#8221; which means that I wouldn&#8217;t have control over the code that actually runs these features.</p><p>How can I still use a handmade philosophy of writing my own simpler more focused code rather than depending on lots of slow and volatile libraries?&#8221;</p></li><li><p><strong>[01:15:36]</strong> &#8220;do you have any advice for gracefully avoiding or recovering when Apple helpfully deletes your stuff in the background (kills your process when you switch apps, makes you redownload files from icloud)&#8221;</p></li><li><p><strong>[01:19:21]</strong> &#8220;In your talk at the Better Software Conference you mention the &#8220;fat struct&#8221; as a good default option for programming in a systems level language. I think I know the gist of what you mean by this, but I am curious if you have a slightly more formal definition for the term and a general explanation for why it&#8217;s a good default approach.&#8221;</p></li><li><p><strong>[01:27:32]</strong> &#8220;Most of the course so far talk about programs and assembly for actual chips. How do the aspects of concern under performance aware programming change, if at all, if the target is WASM?&#8221;</p></li><li><p><strong>[01:30:13]</strong> &#8220;I perfectly understand why writing to al in a loop is slower than writing to rax, but I get different results for al, ax, eax and rax. The loop writing to al and ax goes at nearly 1/4 of the loop writing to rax, but the eax loop goes at 1/2 that speed. Shouldn&#8217;t they run the same since writing to eax does not preserve the upper bits?&#8221;</p></li><li><p><strong>[01:31:01]</strong> &#8220;If there&#8217;s really no &#8216;rax&#8217; in a cpu at any given point of time, why (how?) debuggers show a single value what you stop them, or linux will only write a single value in the core dump file? Shouldn&#8217;t there be some kind of a tree? I just don&#8217;t know if I can&#8217;t trust this information for debugging... What if it shows a register value from one branch, but the bug was caused by the value from another, won&#8217;t I be misled?&#8221;</p></li><li><p><strong>[01:40:21]</strong> &#8220;I was trying to reproduce your results from the RAT and register file lecture. I am running them on Alder Lake chip (i7-12700H). My results were quite the opposite to yours the add only loop was either having similar performance or run much faster compared to mov and add one. I found your article that hints that Alder Lake is able to decouple those chained adds.&#8221;</p></li></ul>
      <p>
          <a href="https://www.computerenhance.com/p/q-and-a-79-2025-09-28">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Reading CPU Diagrams]]></title><description><![CDATA[If you've followed the Performance-Aware Programming course up to this point, you already know everything you need to know to ballpark CPU performance with nothing more than IHV marketing slides.]]></description><link>https://www.computerenhance.com/p/reading-cpu-diagrams</link><guid isPermaLink="false">https://www.computerenhance.com/p/reading-cpu-diagrams</guid><pubDate>Wed, 20 Aug 2025 22:17:02 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!_TPy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F955cff14-04c3-4c35-a89b-f81a9525c2aa_1920x589.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_TPy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F955cff14-04c3-4c35-a89b-f81a9525c2aa_1920x589.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_TPy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F955cff14-04c3-4c35-a89b-f81a9525c2aa_1920x589.jpeg 424w, https://substackcdn.com/image/fetch/$s_!_TPy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F955cff14-04c3-4c35-a89b-f81a9525c2aa_1920x589.jpeg 848w, https://substackcdn.com/image/fetch/$s_!_TPy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F955cff14-04c3-4c35-a89b-f81a9525c2aa_1920x589.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!_TPy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F955cff14-04c3-4c35-a89b-f81a9525c2aa_1920x589.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_TPy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F955cff14-04c3-4c35-a89b-f81a9525c2aa_1920x589.jpeg" width="1456" height="447" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/955cff14-04c3-4c35-a89b-f81a9525c2aa_1920x589.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:447,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:669958,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.computerenhance.com/i/171311364?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F955cff14-04c3-4c35-a89b-f81a9525c2aa_1920x589.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_TPy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F955cff14-04c3-4c35-a89b-f81a9525c2aa_1920x589.jpeg 424w, https://substackcdn.com/image/fetch/$s_!_TPy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F955cff14-04c3-4c35-a89b-f81a9525c2aa_1920x589.jpeg 848w, https://substackcdn.com/image/fetch/$s_!_TPy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F955cff14-04c3-4c35-a89b-f81a9525c2aa_1920x589.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!_TPy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F955cff14-04c3-4c35-a89b-f81a9525c2aa_1920x589.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>This is the fifth video in Part 5 of the Performance-Aware Programming series. Please see the <a href="https://www.computerenhance.com/p/table-of-contents">Table of Contents</a> to quickly navigate through the rest of the course as it is updated.</em></p>
      <p>
          <a href="https://www.computerenhance.com/p/reading-cpu-diagrams">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Q&A #78 (2025-07-21)]]></title><description><![CDATA[Answers to questions from the last Q&A thread.]]></description><link>https://www.computerenhance.com/p/q-and-a-78-2025-07-21</link><guid isPermaLink="false">https://www.computerenhance.com/p/q-and-a-78-2025-07-21</guid><pubDate>Mon, 21 Jul 2025 22:33:09 GMT</pubDate><enclosure url="https://substack-video.s3.amazonaws.com/video_upload/post/168897645/177a70f8-f447-4c1e-be75-53990d2f215a/transcoded-27507.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>In each Q&amp;A video, I answer questions from the comments on the previous Q&amp;A video, which can be from any part of the course.</em></p><p>The questions addressed in this video are:</p><ul><li><p><strong>[00:03]</strong> &#8220;Hi, I've seen you talk before about the state of gpu apis, and I am aware that you were talking about SoC solution to this and many other problems with current computers. My question is this, how would you design a gpu api, if you had a magic stick that you could shake to make it appear and be common on all computers today? If I understand your position, you'd have the programmer talk directly(or at least through a paper thin layer) to the gpu, but how would that work in practice? If that's not too much to ask, I'd love seeing a short snippet of pseudocode of how it would work on the consumer side, thanks!&#8221;</p></li><li><p><strong>[04:10]</strong> &#8220;When dealing with async io, do you think an await style interface like in javascript/go is a good model, or do you think some simple state machine or any other method is better? I find that writing state machines for this kind of stuff gets overly explicit in a lot of cases. Thanks!&#8221;</p></li><li><p><strong>[07:52]</strong> &#8220;Given the course, I was trying to apply some techniques to my own toy problem that given a list of words and an NxN grid tries to generate a wordpuzzle where each word is added horizontally, vertically or diagonally.</p><p>I tried to reduce pessimization as much as possible. The program performs a recursive search where each word added successfully to the grid increases the depth by one. Memory is pushed and popped with a single memory arena of max 512KiB. The hot code is the check that given a word and the current board, to see if the word will fit.</p><p>And I am having a hard time vectorizing this loop such that it's actually more performant. The single byte checks seem to outperform vectorization by a factor 10 as the data is too sparse? I also could not find an AVX "scatter" function that does the opposite of a movemask_epi8. Was wondering if you have any thoughts on how one would optimize this further.&#8221;</p></li><li><p><strong>[13:03]</strong> &#8220;I was making a homework about asm volatile, and I noticed that all fma instructions were using memory operands and now I'm wondering why there are no fma instructions with immediate operands? Surely it should be beneficial to bake values in some cases, right? Or maybe there are never immediate operands for floating point instructions.&#8221;</p></li><li><p><strong>[17:54]</strong> &#8220;Will estimating the cost of more &#8216;branchy&#8217; workloads like lexing or e.g. json parsing be covered?</p><p>For context I am trying to apply what I've learned from this course to optimize a lexer for a programming language and have gotten from ~0.8 GB/s to ~1.3 GB/s lexing the linux source code. However, it seems impossible for me to get it to run any faster, eventhough 1.3 GB/s is nowhere near memory bandwidth and most of the work is just deciding what kind of token to spit out and how much to advance. It feels to me like ~2GB/s or so could be the limit to how fast you could lex one token at a time, and going above that would require producing more than one token like I believe simdjson does. However, I have no clue if this is remotely correct, since my intuition about the cost of branchy code is provably very bad.&#8221;</p></li><li><p><strong>[20:12]</strong> &#8220;Do you have any good resources that outline how to improve a user's experience. The course emphasise performance but I'm curious what other things would you consider to improve a user's experience.&#8221;</p></li></ul>
      <p>
          <a href="https://www.computerenhance.com/p/q-and-a-78-2025-07-21">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Selectively Preventing Optimizations]]></title><description><![CDATA[When we want to microbenchmark code in a high-level language, we want almost all optimizations applied - except for the ones that would remove the code entirely.]]></description><link>https://www.computerenhance.com/p/selectively-preventing-optimizations</link><guid isPermaLink="false">https://www.computerenhance.com/p/selectively-preventing-optimizations</guid><pubDate>Sun, 29 Jun 2025 05:11:19 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!hsX2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F561b2ddf-8537-4134-b0f4-b6ce49c2942c_1920x1279.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hsX2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F561b2ddf-8537-4134-b0f4-b6ce49c2942c_1920x1279.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hsX2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F561b2ddf-8537-4134-b0f4-b6ce49c2942c_1920x1279.jpeg 424w, https://substackcdn.com/image/fetch/$s_!hsX2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F561b2ddf-8537-4134-b0f4-b6ce49c2942c_1920x1279.jpeg 848w, https://substackcdn.com/image/fetch/$s_!hsX2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F561b2ddf-8537-4134-b0f4-b6ce49c2942c_1920x1279.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!hsX2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F561b2ddf-8537-4134-b0f4-b6ce49c2942c_1920x1279.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hsX2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F561b2ddf-8537-4134-b0f4-b6ce49c2942c_1920x1279.jpeg" width="1456" height="970" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/561b2ddf-8537-4134-b0f4-b6ce49c2942c_1920x1279.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:970,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:430452,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.computerenhance.com/i/167083580?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F561b2ddf-8537-4134-b0f4-b6ce49c2942c_1920x1279.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hsX2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F561b2ddf-8537-4134-b0f4-b6ce49c2942c_1920x1279.jpeg 424w, https://substackcdn.com/image/fetch/$s_!hsX2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F561b2ddf-8537-4134-b0f4-b6ce49c2942c_1920x1279.jpeg 848w, https://substackcdn.com/image/fetch/$s_!hsX2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F561b2ddf-8537-4134-b0f4-b6ce49c2942c_1920x1279.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!hsX2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F561b2ddf-8537-4134-b0f4-b6ce49c2942c_1920x1279.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>This is the fourth video in Part 5 of the Performance-Aware Programming series. Please see the <a href="https://www.computerenhance.com/p/table-of-contents">Table of Contents</a> to quickly navigate through the rest of the course as it is updated. The listing referenced in the video (listing 196) is available <a href="https://github.com/cmuratori/computer_enhance/tree/main/perfaware/part4">on the github</a>.</em></p>
      <p>
          <a href="https://www.computerenhance.com/p/selectively-preventing-optimizations">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Q&A #77 (2025-06-19)]]></title><description><![CDATA[Answers to questions from the last Q&A thread.]]></description><link>https://www.computerenhance.com/p/q-and-a-77-2025-06-19</link><guid isPermaLink="false">https://www.computerenhance.com/p/q-and-a-77-2025-06-19</guid><pubDate>Fri, 20 Jun 2025 05:51:21 GMT</pubDate><enclosure url="https://substack-video.s3.amazonaws.com/video_upload/post/166360526/023acde1-d79a-4696-96ca-dba146b1d2ef/transcoded-98329.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>In each Q&amp;A video, I answer questions from the comments on the previous Q&amp;A video, which can be from any part of the course.</em></p><p>The questions addressed in this video are:</p><ul><li><p><strong>[00:02]</strong> &#8220;This is more of a learning personal project and I'm looking for something I can do that has a little bigger impact. Do you have some other tool/program/improvement you'd like to see but don't have the bandwidth to do yourself that you can share or just an avenue you'd like to see explored more that I could look into to produce something of value to other fellow programmers?&#8221;</p></li><li><p><strong>[02:57]</strong> &#8220;Hey Casey, I posted a question about non-text programming languages and fear I was a little late on the Q&amp;A cycle, Substack didn't send me an email for it :( , for the question to make it into this Q&amp;A's rotation. I am still very curious what your thoughts about a non-text language are, or if you have talked about them before and have a link to those articles/videos!&#8221;</p></li><li><p><strong>[03:51]</strong> &#8220;I wanted to know if you have already covered &#8216;False Sharing&#8217; from a cache point of view and its impact on performance. If not, do you have any plans for it? Do you plan to also cover effects of NUMA on performance?&#8221;</p></li><li><p><strong>[08:05]</strong> &#8220;I just rewatched your video on the GJK algorithm from 2006, and when you talk about the triangle case you say that, although there are six ifs and elses written down, you only ever execute three of them at most, so it's at most three tests and jumps. Since the logic only relies on dot products and cross products, I would assume that the multiplies and the adds would have a lesser impact on the performance compared to the penalty of mispredicting multiple branches so close to each other.</p><p>Am I getting something wrong?&#8221;</p></li><li><p><strong>[11:50]</strong> &#8220;According to you, what would be the impact of AI on performance aware programming? I mean, is it likely to become more common that organizations / individuals offload this task to sophisticated AI coding agents? I know its a very vague query with possibly no clear answers but just wanted to know what this community thinks about it.&#8221;</p></li><li><p><strong>[17:34]</strong> &#8220;Hi Casey, I was working on a simple CSV reader to analyze some logs but I have many huge files and it takes a while. I wanted to try out Intel V Tune and it says that the code is mostly front-end bound and has 100% DSB misses. I was wondering what are those? Are they a problem? How can we solve them?&#8221;</p></li><li><p><strong>[23:10]</strong> &#8220;Why is memory management so obscure, and why are people inventing so many languages to ( ostensibly ) fix that, while criticizing C for being unsafe? Why are use-after-free / double-free and so on, issues mentioned when talking about C being unsafe? In other words, why would I need to free memory, when I know exactly how much memory a program will ever need and I can keep on reusing it ( you can't know at a given time how much memory you need, but there is a limit, since you can't allocate infinite memory ).</p><p>In real life, when we discover new concepts ( like in physics ) we don't invent a new language to explains those concepts in., instead, we improve our current language / language practices.<br>No worries if you can't include everything from the msg in the Q&amp;A.&#8221;</p></li><li><p><strong>[29:33]</strong> &#8220;Hi Casey, before taking the course I would consider using SIMD instead of scalar operations and utilizing memory caches properly to be optimizations. But it seems like you prefer to think about them as a performance aware programming. Then I wonder what would be some concrete examples of optimizations? I just can't imagine something beyond that to get even more performance.&#8221;</p></li><li><p><strong>[32:59]</strong> &#8220;Hello, does anybody have any idea why could there be a huge difference between reading a file to mmap'ed memory and malloc'ed memory on Linux, AMD Zen3 chip?&#8221;</p></li></ul>
      <p>
          <a href="https://www.computerenhance.com/p/q-and-a-77-2025-06-19">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Simplified Haversine Candidates]]></title><description><![CDATA[Even for a computation as simple as our haversine loop, removing waste yields a surprisingly large performance improvement for very little effort.]]></description><link>https://www.computerenhance.com/p/simplified-haversine-candidates</link><guid isPermaLink="false">https://www.computerenhance.com/p/simplified-haversine-candidates</guid><pubDate>Sun, 15 Jun 2025 00:00:11 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08a56680-75ec-45de-8c92-223125bcac35_5616x2535.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dboe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08a56680-75ec-45de-8c92-223125bcac35_5616x2535.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dboe!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08a56680-75ec-45de-8c92-223125bcac35_5616x2535.jpeg 424w, https://substackcdn.com/image/fetch/$s_!dboe!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08a56680-75ec-45de-8c92-223125bcac35_5616x2535.jpeg 848w, https://substackcdn.com/image/fetch/$s_!dboe!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08a56680-75ec-45de-8c92-223125bcac35_5616x2535.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!dboe!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08a56680-75ec-45de-8c92-223125bcac35_5616x2535.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dboe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08a56680-75ec-45de-8c92-223125bcac35_5616x2535.jpeg" width="1456" height="657" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/08a56680-75ec-45de-8c92-223125bcac35_5616x2535.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:657,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1489541,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.computerenhance.com/i/165824301?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08a56680-75ec-45de-8c92-223125bcac35_5616x2535.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dboe!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08a56680-75ec-45de-8c92-223125bcac35_5616x2535.jpeg 424w, https://substackcdn.com/image/fetch/$s_!dboe!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08a56680-75ec-45de-8c92-223125bcac35_5616x2535.jpeg 848w, https://substackcdn.com/image/fetch/$s_!dboe!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08a56680-75ec-45de-8c92-223125bcac35_5616x2535.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!dboe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08a56680-75ec-45de-8c92-223125bcac35_5616x2535.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>This is the fifteenth video in Part 4 of the Performance-Aware Programming series. Please see the <a href="https://www.computerenhance.com/p/table-of-contents">Table of Contents</a> to quickly navigate through the rest of the course as it is updated. The listings referenced in the video (listing 194 and 195) are available <a href="https://github.com/cmuratori/computer_enhance/tree/main/perfaware/part4">on the github</a>.</em></p>
      <p>
          <a href="https://www.computerenhance.com/p/simplified-haversine-candidates">
              Read more
          </a>
      </p>
   ]]></content:encoded></item></channel></rss>