Many programming "best practices" taught today are performance disasters waiting to happen.
Thanks for making this public - I fight these battles at work all the time, and it's nice to have a concise POV from someone that corroborates what I can show thru experimentation.
The pushback I imagine getting back from this is about maintainability, "Developer Experience" issues, etc, which I don't have the experience to refute. But at least I can measure improvement - proponents of those other issues kinda can't
I have been fighting against the "clean code" culture since the early '90s with the establishment of the OOP culture, made also a lot worse by Java later. I can guarantee that, at least in the context I work in, there is no way the performance oriented code would be accepted; it would be labelled as "ugly", "un-maintainable", "difficult to read"; the lost performance would be considered a very worth price to pay to have "beautiful" code; and your reputation would be ruined just for proposing the more efficient alternative. It seems today most programmers write code as if users were going to hang pictures of it in their homes. Instead, users don't care and what they get instead is software that drains their batteries faster and hence more expensive to run.
I think that the energy perspective could be a factor in getting more attention to the performance issue of the software. For example, in Ireland they have huge problems with the data centres that need more power than the national grid can provide; so they have sometime to deny power to other businesses to let the data centres go. At the moment, the focus is to minimise the amount of stored data; but, ultimately, these data centres run software possibly made up by huge stacks of tomcat/Spring/Hibernate/JVM/docker/serverless etc, the apotheosis of bloat and inefficiency. If the pressure to write efficient software came externally from authorities instead from a lone team member, then we could start to see a change.
Casey is really nailing his theses to the door in this video.
Fantastic! It's really a relief to see someone make this case methodically and non-ideologically. (Your point about that switch statement is right on - sometimes just by laying out stuff in a sequential way makes it obvious where stuff can be cut down.)
i've sat in on conference talks where the guy up at the front took literally 5 lines of code in a single file and blew it up into like 50 lines across 10 files, and his justification for it was, verbatim, 'we want the next programmers who see our code to think we are smart'. It all felt very wrong, and yet i assumed he must know something i didn't - after all, he's presenting and i'm in the audience.
So, i dutifully worked for several months rebuilding a project following these clean code principles, and just found it created a lot of new spaghetti, duplication, and piles of boilerplate where everything had once been so direct and readable. Polymorphism especially - it *seems* like such a powerful idea, but ends up making the flow of information significantly harder to follow. Ended up abandoning the rebuild and simply refactoring my 'unclean' code in a last-minute marathon. Made me feel like a bad programmer, like i'm hiding some secret shame. There's such weird peer pressure around it now, seems to mount a little each year - with some people, it's like the code equivalent of saying you're a Trump voter or something.
Anyway, i'm subscribed now, gonna go back through the previous ones and watch through. i appreciate your pro presentation skills and clarity.
It's nice to hear confirmation of programming practices that I've also found problematic.
For example, I have a note in our company handbook to disregard linters' warnings about "cyclomatic complexity", because resolving the warning typically involves extracting a "small" function that does "one thing" - but after this extraction the code is harder to read, harder to refactor (like when the area-calculation logic is in multiple files in the video's "clean" code), etc.
We outsource a lot, and my code reviews after handoff involve undoing a lot of these practices. This typically results in less code that is easier to read and maintain. I'm not sure what these practices are providing, except maybe to prevent inexperienced programmers from writing something really incomprehensible and buggy (but they should be given more coaching and supervision in that case, instead of given "clean code" principles to follow).
Probably this goes outside the scope of the course (maybe better suited for an architecture course) but talking about "SOLID" principles and their downsides as you just did with "Clean" code could be great.
One thing that gets me is the staggering selfishness of the "developer experience" people. Not only are their pronouncements usually totally untested, not only do they constantly cite some Big Other (who would certainly know if these practices didn't work!), but the one that hurts me the most is "my time is worth more than the computer's time" or flippant remarks about engineering salaries. Even if "clean code" does make development faster (unproven!), to think that a couple hundred bucks of engineering salary is worth actual human lifetimes of loading screens (which are easy to reach at even modest user counts) is just beyond the pale to me.
Always found it amusing that using dynamic dispatch, and being in the dark about how things work is 'clean.' :)
Great stuff, thank you.
Playing the devil's advocate here
I don't think clean code was ever about performance, so was this comparison unfair? I'm not saying the performance comparison shouldn't be made. But maybe the appropriate comparison would be by readability and maintainability. To "beat" clean code, one should show performance-aware programming is more readable and maintainable, right? Even in large code bases. Give a slightly more complicated performance-aware program to some programmers. How fast and well can they understand and add features to given program? Do you have these measurements?
Of course, the more the programmers are trained on clean code, the harder it will be for them, I'm aware. You're probably suggesting a whole revision of the programming education "ecosystem". Starting from "data transformations" instead of "objects and what they do" in the early stages when teaching programming. The latter seems so natural and intuitive (or maybe I just forgot the struggles when I learned OOP). That's quite a effort.
On another note: another way clean code is practiced is by following "don't write code for the computer, write code for other programmers". It's almost saying "programmers don't know or don't care about what the computer is actually doing". That seems to be the general mindset which causes the 1000x slower software.
I think one thing the lecture should have shed some more light on, or even dedicated an entire video about is the fact that these "clean" code rules actually and fundamentally goes in the opposite direction of what the CPU want you to do to run more efficiently. Virtual functions, array of pointers to objects, separation of concerns, each object should know how to do the work on his own in isolation instead of doing the work on multiple objects together, etc. all these ideas fundamentally go against the features that the CPU has to improve performance. These ideas will confuse out-of-order execution, pollute the cache following pointers, and make it difficult to load data into SIMD registers.
Because of that, the more someone apply these ideas the worse it is for the CPU today or even in the future.
Great talk, thank you. Just a little disappointed to not have any information on how far away the sun is in this lecture, seems like a very important point in a programming lesson :).
This comes at the right time, because i have actualy a use-case that exactly shows that writing best-practices software hurts performance and even memory a lot!
We have hundreds of signals containing 10k up to 500k samples and i always need the minimum and maximum value of the actual data. But there is a caveat, not each sample are valid, because of this there is an array with the same length of data that stores a flag that indicates if the sample is valid or not. It is not sorted, so we always have to check the index for every value.
The team writing the library that computes the min/max values used a thread-based for-loop that computes the min/max for each valid value. Also it is guarded behind at least 4 levels of indirection before the math is actually executed. It is painfully slow and it uses a ton of heap allocated memory. Those codes where written using solid and clean-code principles -.-
After analyzing the code, i have written a much faster solution using only vector instructions (SIMD) and stack allocated SIMD vectors, that does the same thing and use conditional selects to only compute valid values. It is at least 10x times faster than the other code. After unrolling the loop 4x times, its twice at fast now.
I didn´t changed the underlying crappy data structure, because it was written by another company.
Also the language is pure C# / .NET 6.0, so i can´t write it in C unfortunatly :-(
Thanks for making this video! I do notice one thing that we're keeping the same memory size for all the shapes by using a union. This avoids the pointer indirection, but also wastes memory. In this case it probably doesn't matter much. But what if we have polymorhpic types of objects that have drastically different memory size. Isn't it a waste to use the maximum amount of memory to hold all types of objects?
I think that section alone has the potential to change the industry. Most companies are "data-driven" these days, except for their programming practices. You've just done a data-driven presentation on why the industry's practices are harmful to the software being built. Obviously, a mountain of excuses would then follow, but it's a great seed for great change.
Ah okay, you're talking about the "Uncle Bob" Clean Code book. I see it now, "G23: Prefer Polymorphism to If/Else or Switch/Case". I tend to disagree with Bob's advice on other topics online so I never bought his Clean Code book. In any case, I think you've proven pretty easily that it's not advice that anyone should follow.
I’m loving the idea of being able to escape the performance shackles of writing clean ruby code for CPU bound cases, and looking forward to finding out what I can do from the rest of the course… as well as hopefully learning how to do higher-performance graphics programming.
One question though… mightn’t Amdahl’s Law explain the proliferation of clean code practices in bespoke enterprise software and “web applications”? That type of code tends to be very storage-bound, with multiple hits to databases that are an order of magnitude larger than RAM, and have significant network latency on both request and response. I’ve had the experience of rewriting a long-running Ruby ETL job in Rust, only to find that IO overhead meant that I saw no speed improvement unless the working set fit comfortably in RAM.
Under these conditions, it seems like the relative performance costs of “clean code” become negligible?
That said, I think a lot of that is changing now. I’ve had plenty of “storage bound” workloads in the past that would easily fit in the 1TiB+ RAM of modern servers, and a 24x PCIe4.0x4 NVMe array seems like it would probably remove the bottleneck altogether?