Response to a Reporter Regarding "Clean Code, Horrible Performance"
Since I have a lot to say on this topic, it seemed like posting this response to The New Stack might make a good preview.
I was recently contacted by The New Stack and asked for comment on my “Clean Code, Horrible Performance video”. The writer asked:
I'm hoping to write an article for the New Stack about your "Clean" code, horrible performance post (and the resulting discussions it provoked online). Have you heard any interesting critiques of your thesis — or any new insights that made you revise your thesis?
Maybe you could give me your own summary of the reactions it received. (Did you see a surge in signs up for your Performance-Aware Programming series?) I watched your interview with ThePrimeTime, and in general it seems like this topic provoked a big reaction. Is there some larger, latent frustration that's been waiting to come out?
Here is my response in full:
The "Clean" Code, Horrible Performance video was material from the beginning of my Performance-Aware Programming series. In that part of the course, I am showing simple examples of large performance differences. The goal is to quickly demonstrate the magnitude of the performance cost for each programming style choice a programmer makes. There are videos demonstrating the effects of waste, IPC, SIMD, multithreading, and so on. This particular video was meant to underscore just how bad virtual functions can be for performance, but not to explain why, since that comes later in the course once we have covered the prerequisites.
Because the video was a demonstration and not an explanation, this did lead to some genuine confusion when the video became very popular outside of the course. Some people didn't understand what was happening in the video, and that is understandable. Some people thought that the major performance differences I demonstrated came from indirected function calls having a performance penalty. While true, that is actually a minor part of it. The more important aspect is that both the compiler and the programmer have difficulty optimizing across virtual function calls, because inheritance hierarchies often prevent efficient optimization due to how languages specify their behavior.
Some performance-minded people objected to the demonstration because I did not show a segregated array case, which would have been faster than any of the cases I showed. This kind of complaint came mostly from “Data Oriented Design” proponents. I do not disagree with them at all. I love that kind of design, and I use it myself often. The reason I did not show that in this video was because I was trying to demonstrate the performance difference between methods that handle data types whose type needs to be dynamic. If this video were meant to be a complete analysis of the topic, rather than a simple demonstration, I would have used real-world examples like graph traversal, where types cannot be easily segregated into separate arrays without a lot of extra work. But this video was merely meant to show the performance difference, so I did not include all of that additional code, as it would further obfuscate what I was trying to demonstrate.
Again, all of these things are covered later in the course in great detail. But for now, suffice to say, I would never recommend to someone practicing Data Oriented Design that they should change their segregated array code to a switch statement! That would be madness. Switch statements are for code where the structure prevents total segregation of types — either because the problem is not expressible that way, or because the programmer is working on a system already designed in such a way as to make that change infeasible.
But regarding the larger conversation going on about “performance vs. maintainability”, there are several points that I would like to address in subsequent videos. It appears as if there is a large knowledge gap in the industry more broadly. Some of the things I saw that I consider to be factually false include:
That "clean code" is whatever someone says it is in the moment. If you search for "clean code", or if you look at what is taught to new programmers in school, or examine the coding standards enforced in corporate programming environments, there is clearly a set of dominant principles associated with the term "clean" that includes things like using inheritance hierarchies and avoiding switch statements. Unfortunately, these facts don’t change just because someone says, "well that's not what I mean when I say clean code". We have to accept that the vast majority of new programmers are being told that "clean code" includes using inheritance hierarchies and avoiding if statements, because that is the reality of the situation. Obviously, I too would prefer that "clean code" did not include these things, so I'm not disagreeing in principle with the people who say this. But I am disagreeing with them about what the standard definition is, and I think typing "clean code" into Google or YouTube proves my point here:
That using millions of classes with very tiny functions leads to a more readable, more flexible, or less buggy codebase, and so it is a "tradeoff" that is worth making even if it means the code is massively slower. I believe you can demonstrate that in practice, the opposite is true. Code that is reasonable for a CPU to process is often easier for humans to process, too, because it is simpler to understand when you need to debug or modify it. I believe this particular misguided notion about “cleanliness” comes from comparing “clean code” to bad alternatives, rather than to good alternatives, which do exist. Or, perhaps people believe that reasonable performance can only be achieved via heavily optimized code, which often can be hard to read or modify. But that is simply not necessary. Code with reasonable performance is actually very easy to read and modify, and clearly there are many people who haven't been exposed to that kind of code for some reason.
That advocates for these particular "clean code" principles understand the performance tradeoffs of their ideas, and have simply chosen a particular tradeoff. From what I’ve seen in the discussion, it seems that “clean code” advocates have either little or no knowledge of how CPUs work, nor do they know the magnitude of the performance costs of their ideas. This goes far beyond just misunderstandings about virtual functions. I have seen comments from people who dismissed the video that are very worrisome because they would appear to require a complete lack of modern CPU and compiler knowledge. These comments include things like "well why don't you just program everything in assembly language, then.” They apparently do not know that today, in all but the most isolated and rare cases, there is no performance benefit in hand-written assembly. C/C++ code can almost always be written such that the resulting assembly is either as fast, or almost as fast, as if it were hand written. I assume these people are thinking about how compilers and CPUs were in the 1990s, when that statement might have made sense. Similarly, I've seen comments like "virtual functions and switch statements are both implemented using a jump table, so their performance should be similar". Not only will the compiler-generated implementation of a virtual function and a switch statement be different for performance in general, but the more important part is that compilers can optimize across switch statements, but cannot optimize across virtual function calls. So the performance differences can go far beyond just the implementation of the predication. In fact if you were to construct a "worst case" for this, you could easily demonstrate performance differences that were orders of magnitude higher than what I showed in the video! And those are just a small sampling. There are other concerning comments that would take too much elaboration to include here, but which I’d like to go over in detail in future videos.
That by changing from a virtual function to a regular function with a switch statement, you change from "open set" to "closed set" polymorphism, so it is a reduction in feature set and isn’t comparable. This is false. The reason people believe this is because they have been taught to think in terms of words like "polymorphism" instead of thinking about how a program actually works. In truth, switching between these two approaches is an API transposition. It goes from "open set" on types and "closed set" on functions, to "open set" on functions and "closed set" on types. In other words, it trades the ability for a third party (without access to the code base) to add new types for the ability to add new functions. So you do not lose something, you trade something, which is a very important distinction. It is especially important considering the fact that in most systems, you add operations far more frequently than you add types. So claiming one is inherently more fully featured than the other is simply false. They do have different features, but neither is a superset of the other.
There were many more, but that is a reasonable cross-section. I plan to post videos that explore all of these issues when I have time to do so.
That said, in summary I would say that the response to the video has actually been rather encouraging. I have received a huge number of "thank you" notes from people who disagree with “clean code” principles not just because of their poor performance, but because they do not deliver the supposed benefits in practice (readability, maintainability, etc.).
So some of the response was concerning because of how many dissenting opinions appeared to be based on factual errors, but overall I was encouraged by the surprisingly large number of people who dislike “clean code” and want to do something about it. That perhaps bodes well for the future, and indicates that perhaps there is a great deal of “latent frustration that's been waiting to come out”, as the original question suggested.
If you’d like to sign up for my Performance-Aware Programming series, you can do so here:
I have been programming for about 40 years now. Unfortunately, I have been sucked into many programming fads over the years. However, over time, my practical programming experience has almost entirely converged on principles that are nearly identical to what Casey, and Jonathan Blow, have been talking about for years and is reflected in the content of this course: straight-forward programming for the machine is almost always easier to understand and remarkably efficient. Unfortunately, this insight has been almost entirely lost in today's programming scene.
Telling people to "roughly" program like you did in the 80s and 90s (when computers had fewer resources) doesn't sell books, videos, or consultant gigs. Ironically, Casey can make an educational series like this to fill the gap/correct the misconceptions, and get folks to pay up. It's not that fashions go in cycles -- Casey is espousing on eternal truths of programming: programs all run on real, physical machines. This is not a fad but a way to help people reconnect with fundamental truths.
I read a comment to your video on the reddit/gamedev of someone who claims that compilers can "very easily" vectorise code, that the compiler will "de-virtualize most of the virtual calls", that the compiler will generate the "exact same code" for a switch statement and a vtable. Also, it seems she/he didn't understand the code that exploits the instruction parallelism and thought it was just loop unrolling. I wouldn't have expected that from a game developer. But this is something I have been noticing more and more often; there is sort of "compiler-oriented" thinking and a belief that the compiler will be able to understand completely the intention of the programmers and cut through the many layers of abstraction of their mental model.