9 Comments
User's avatar
Mouton Guillaume's avatar

Such an interesting read ! Thank you for sharing it with us !

Expand full comment
Andrejus Antoninovas's avatar

Stupid me, I forgot I have laptop with 12700H. Not sure what a difference between between mobile and desktop cpus, but I could to test things on windows 11 and linux.

Expand full comment
Dancho Makaveev's avatar

Another Casey banger :)! Thank you!

Expand full comment
Tobias's avatar

With discoveries like this, every passing day my jestful claim that Assembly is really a declarative language, rings more and more true. Thank you for these deep dives!

Expand full comment
Nick Vaughn's avatar

I definitely initially misread the title as “The Case of the Missing Excrement” and was *really* confused

Expand full comment
Casey Muratori's avatar

Since I had to use Event Tracing for Windows, I can assure you that not only was the excrement not missing, it was present in abundance.

- Casey

Expand full comment
Daniel Bendix's avatar

I have an Alder Lake CPU, and observed this when doing the homework for part 3.

I have also tested this on a Raptor Lake CPU, which follows the exact same pattern.

I'm using Google Benchmark, since it makes it really easy to read the performance counters you want, without recompiling.

I was able to create a few other benchmark programs that probe a bit at how the CPU executes these things. I've observed that the front-end will fuse these instructions with a jump when they occur contiguously, at which point each cycle can only execute a single of immediate addition or subtraction. Adding a nop prevents the fusion, and the optimization happens again. So a question would be how often the CPUs leverage this optimization in the wild.

I also found something in Agner Fog's microarchitecture manual, in the section about Alder Lake, where it says: "Integer addition with a small immediate constant has zero latency in some cases."

I've created a gist with my benchmark program, and the output from running these on Alder Lake and Raptor Lake CPUs, with a few relevant performance counters: https://gist.github.com/danielbendix/a377a976e62b6e8a8ea9c93636f0ff1e

Anyone let me know if you have something you'd really like tried on these, and I'll see what I can do.

Expand full comment
sqrt_negative1's avatar

Great article thanks for sharing this research! And, sorry you had to go through the Ultimate Sadness...

On another note I wonder why they didn't stick with this for newer processors? Maybe it was only something they experimented in Golden Cove and turned out not as beneficial?

Expand full comment
Casey Muratori's avatar

I am not certain what processesors have this, since I only was able to test Golden Cove. It's possible that it does happen in some other Intel processors!

- Casey

Expand full comment