Level-headed discussion of such a topic is sorely needed and very welcome. Looking forward to the rest of the episodes!
Having recently started exploring LLM-based coding (by company mandate), I was surprised at how capable newer models (Opus 4.6) proved to be. But there is one major thing I'm still missing from the equation, even when using these tools for adequately scoped and low-complexity tasks: namely, the costs! Which are naturally obfuscated by the big companies, who are happily losing staggering amounts of money in a mad dash for their shot at eventual dominance. Usage limits are per-request, regardless of whether you're asking for a near-instant one-liner, or one hour of iteratively crunching through problems until all tests pass, so that's not a useful measurement tool either.
I expect prices to be completely detached from reality for the time being, but for agentic workflows to become a part of proper software engineering, the real costs must also factor in at some point. It is some variation of the dancing bear problem: sure, you've genetically engineered, bred, and raised a bear that dances reasonably well... but at what cost?
having started exploring them from a on high mandate myself, i've wondered the same thing long term. in 10 years is it cost effective to have a LLM write a routine that would take me 20min to do?
I really appreciated the perspective and clear thinking in this interview. I'm curious what you both think about the implications of the role of engineers in the AI-programming paradigm in the context of Peter Naur's essay "Programming as Theory Building." It seems to me that AI is (maybe fundamentally?) not capable of building theory as Naur describes it and that for non-trivial systems, it can build acceptable functionality, but not systems that are maintainable. If this is true, I think we either need engineers to build the theory and probably guide agents in code generation, or we don't need engineers and we more or less rebuild the system every time it needs modification or extension.
There are a few moving parts here, but I'll start by saying that at a minimum I agree that it's not clear if an AI "has" a theory internally (this is a complicated question of AI interpretability and borders on philosophy at the edges).
Having said that, the code itself *is* a theory of the problem. For a human we can easily distinguish the theory in the mind from the manifestation of the theory in bytes, because a human can say "I didn't mean for it to work like that" (mismatch between the imagined theory and the executable code). We can't really say if an AI is "surprised" that its creation doesn't reflect its internal imagination. To be clear, an AI can produce dialog that makes this claim; we just don't have a good way to evaluate its truth.
It's absolutely true, practically, that the more of the theory you can encode in the context (prompt + related "skills" and "tools" files), the better a result you get with today's technology. The difference between naive use and sophisticated "context engineering" is very large. Whether the person doing the "context engineering" must be an engineer *as we understand the term today* is very unclear. As I mentioned in the discussion, I know semi-technical Project Managers who are making progress driving an entirely AI team.
The last part about rebuilding is also important, and I've analogized it to "0% interest Technical Debt". The reason we currently worry about structure, clarity, and so on is that we imagine a future where we have to *keep* the existing code and reshape it for new needs. But with AI, throwing the current thing away is entirely possible. And so this connects to the idea of repeatability: given a "context", do you reliably get a very similar artifact if you run the AI build process multiple times? If so, then code generation is just another automatable build step. The thing that must remain intelligible to humans is the context, and the code itself can be thrown away every time, only to be replaced by fresh rebuild of everything with the new context.
Excellent discussion. Re. the “just show me one thing you’ve made” test, which is a fine test, I’d just add that there’s currently little incentive to participate given the reasonable expectation that your work will be immediately written off or dismissed as slop regardless of quality slash technical ambition, or of the way users are actually entangled with the thing you’ve made.
Very fair. Unfortunately the discourse is very polarized right now, and I understand people not wanting to submit their work for abuse from the hordes.
1:06:54 (and the context as far back as 1:03:46) - "what happens when a new technology comes into a market that allows you to do tasks more cheaply/efficiently/whatever" "it increases the profitability of low-value work"
there's also work that is below the threshold of priority, but not necessarily below the threshold of value. tasks that are iterative to ongoing value and also not that complicated. like a bunch of simple debugging tools that can get done that otherwise we'd intentionally mark "won't do".
I think the sign change exists, but "low value tasks" has very different light you can shine on it, when it's "dumb dashboard for CEOs" or "dumb value chase for advertising", vs "dumb tool that makes devs slightly more productive and aren't that flashy to the CEO, when you could always be working on user-facing bugs and features instead because you don't have enough engineers to meet THAT demand"
changing economics don't always just enshittify everything.
You are absolutely correct. In fact just yesterday a friend of mine who works on a very technical project told me “I used AI to solve a long-standing bug in our testing harness [that no one had bothered to fix], which I now realize is Jevons activity”. This is absolutely a productive contribution.
Level-headed discussion of such a topic is sorely needed and very welcome. Looking forward to the rest of the episodes!
Having recently started exploring LLM-based coding (by company mandate), I was surprised at how capable newer models (Opus 4.6) proved to be. But there is one major thing I'm still missing from the equation, even when using these tools for adequately scoped and low-complexity tasks: namely, the costs! Which are naturally obfuscated by the big companies, who are happily losing staggering amounts of money in a mad dash for their shot at eventual dominance. Usage limits are per-request, regardless of whether you're asking for a near-instant one-liner, or one hour of iteratively crunching through problems until all tests pass, so that's not a useful measurement tool either.
I expect prices to be completely detached from reality for the time being, but for agentic workflows to become a part of proper software engineering, the real costs must also factor in at some point. It is some variation of the dancing bear problem: sure, you've genetically engineered, bred, and raised a bear that dances reasonably well... but at what cost?
having started exploring them from a on high mandate myself, i've wondered the same thing long term. in 10 years is it cost effective to have a LLM write a routine that would take me 20min to do?
I really appreciated the perspective and clear thinking in this interview. I'm curious what you both think about the implications of the role of engineers in the AI-programming paradigm in the context of Peter Naur's essay "Programming as Theory Building." It seems to me that AI is (maybe fundamentally?) not capable of building theory as Naur describes it and that for non-trivial systems, it can build acceptable functionality, but not systems that are maintainable. If this is true, I think we either need engineers to build the theory and probably guide agents in code generation, or we don't need engineers and we more or less rebuild the system every time it needs modification or extension.
There are a few moving parts here, but I'll start by saying that at a minimum I agree that it's not clear if an AI "has" a theory internally (this is a complicated question of AI interpretability and borders on philosophy at the edges).
Having said that, the code itself *is* a theory of the problem. For a human we can easily distinguish the theory in the mind from the manifestation of the theory in bytes, because a human can say "I didn't mean for it to work like that" (mismatch between the imagined theory and the executable code). We can't really say if an AI is "surprised" that its creation doesn't reflect its internal imagination. To be clear, an AI can produce dialog that makes this claim; we just don't have a good way to evaluate its truth.
It's absolutely true, practically, that the more of the theory you can encode in the context (prompt + related "skills" and "tools" files), the better a result you get with today's technology. The difference between naive use and sophisticated "context engineering" is very large. Whether the person doing the "context engineering" must be an engineer *as we understand the term today* is very unclear. As I mentioned in the discussion, I know semi-technical Project Managers who are making progress driving an entirely AI team.
The last part about rebuilding is also important, and I've analogized it to "0% interest Technical Debt". The reason we currently worry about structure, clarity, and so on is that we imagine a future where we have to *keep* the existing code and reshape it for new needs. But with AI, throwing the current thing away is entirely possible. And so this connects to the idea of repeatability: given a "context", do you reliably get a very similar artifact if you run the AI build process multiple times? If so, then code generation is just another automatable build step. The thing that must remain intelligible to humans is the context, and the code itself can be thrown away every time, only to be replaced by fresh rebuild of everything with the new context.
Excellent discussion. Re. the “just show me one thing you’ve made” test, which is a fine test, I’d just add that there’s currently little incentive to participate given the reasonable expectation that your work will be immediately written off or dismissed as slop regardless of quality slash technical ambition, or of the way users are actually entangled with the thing you’ve made.
Very fair. Unfortunately the discourse is very polarized right now, and I understand people not wanting to submit their work for abuse from the hordes.
This is the singular best discussion on the topic I have heard/seen. Excellent work. I cannot wait for the rest of this series.
Thank you for listening and for the kind words! We hope to continue to earn them.
1:06:54 (and the context as far back as 1:03:46) - "what happens when a new technology comes into a market that allows you to do tasks more cheaply/efficiently/whatever" "it increases the profitability of low-value work"
there's also work that is below the threshold of priority, but not necessarily below the threshold of value. tasks that are iterative to ongoing value and also not that complicated. like a bunch of simple debugging tools that can get done that otherwise we'd intentionally mark "won't do".
I think the sign change exists, but "low value tasks" has very different light you can shine on it, when it's "dumb dashboard for CEOs" or "dumb value chase for advertising", vs "dumb tool that makes devs slightly more productive and aren't that flashy to the CEO, when you could always be working on user-facing bugs and features instead because you don't have enough engineers to meet THAT demand"
changing economics don't always just enshittify everything.
You are absolutely correct. In fact just yesterday a friend of mine who works on a very technical project told me “I used AI to solve a long-standing bug in our testing harness [that no one had bothered to fix], which I now realize is Jevons activity”. This is absolutely a productive contribution.