4 Comments

i'm very happy you've started a longer-form, text-based, programming-centric, habit of communication, instead of making me parse your twitter threads. thanks, casey!

Expand full comment

As one who goes to great lengths to avoid Twitter, I concur!

Expand full comment

Found this interesting video that talks about an actual implementation (in hardware, as far as I can tell), and about exactly this problem (timestamp link): https://youtu.be/WzID6kk8RNs?t=567

The presentation is by Roger Espasa, from what I saw on the mailing list he's the co-chair of the RISC-V Vector work group / committee / thingy. He does mention the extra wiring, connecting the lanes, and a lot of complication needed for an out of order core, but it's hard (for me) to gauge the actual complexity from the presentation (like, did they work for 3 years on just this problem or was it work as usual)

Expand full comment

Had watched this before but not closely so I didn't remember that Roger had discussed this exact issue in that talk, thank you for linking with timestamp! It sounds like for an OoO core specifically (which you want for perf) the problem is worse than the basic single shadow copy Casey talks about here as you need a whole bunch of shadow copies in that case... yikes!

Expand full comment