LLMs Need Supervision, Not Worship
Series: AI & Expertise (2 of 3). Previously: How ChatGPT Helped Me Bend Steel Poles. Next: Cognitive RAM.
Originally published Sept 2023 on a now-deprecated blog.
The AI productivity gap isn't primarily a prompting problem. It's a code review problem.
Senior engineers don't get more from AI because they write better prompts. They get more because they've spent years learning what good code looks like.
Here's the thing nobody tells you about AI tools: they amplify whatever foundation you already have. If your foundation is strong, you get 10x leverage. If it's weak, you get 10x speed at producing confident-looking bugs.
I learned this the hard way. Not with code—with a backyard project. I asked ChatGPT how much weight my steel poles could handle. It gave me wildly different answers—44 lbs, 436 lbs, then something in the hundreds again. I dismissed the low numbers as hallucinations. They didn't match my mental model.
The poles bent immediately when kids used them.
(I wrote the full story in How ChatGPT Helped Me Bend Steel Poles.)
Turns out 2,500 lbs was when it would break. 70-100 lbs was when it would bend. I asked about breaking. I needed to ask about bending. ChatGPT answered correctly. I just didn't know enough at the time to recognize the right answer.
That's the core lesson: not knowing enough to recognize the right answer. And it applies to code every single day.
The same pattern shows up in code. All the time.
AI output looks plausible. It compiles. Sometimes it even passes tests. But the gap between "looks right" and "is right" is where bugs live. Where security vulnerabilities hide. In regulated industries, that gap isn't just technical debt—it's compliance risk.
Less experienced engineers often can't spot the difference. Not because they're stupid, but because they haven't built the mental models yet.
They haven't had to debug a problem in production. Haven't tried to revert a simple change that ends up taking down an adjacent system. Haven't lived through the same failures. Haven't seen the same patterns repeat. They haven't built the scar tissue.
That pattern recognition doesn't come from prompt engineering courses. It comes from years of scar tissue.
Here's what makes this tricky: LLMs don't build understanding of your codebase. They don't accumulate judgment from consequences. Each interaction starts without context.
The model won't get better at your codebase. But you can get better at catching its mistakes—if you have the foundation to build on.
And here's what worries me: less experienced engineers today may not be getting the scar tissue opportunities we had.
When AI writes the first draft, you skip the struggle that builds intuition. You don't fully understand why you got where you are. You can still learn by reviewing—but it's not the same as owning the first failure. It's not the same as coding, debugging, and building a three-pass pattern where you fully understand why a little nuance is required.
The pattern recognition that took years to build might not accumulate the same way.
This isn't a "kids these days" complaint. I never grew up with compilers—there are things about compilers I'll never be expert at. It's about the subject matter and the experience you've gained.
AI may be making seniors more productive while quietly making it harder for less experienced engineers to become seniors.
So what's the takeaway?
If you're early in your career, AI isn't a shortcut past the fundamentals.
Read the books or articles. Study well-reviewed production systems. Learn the patterns—and why there are anti-patterns.
Build and debug something end-to-end without AI first—or have AI guide you to do it yourself.
Start with Designing Data-Intensive Applications.
AI can still help you learn. Maybe it creates sample data, or spins up a mini server that streams data at your service to stress test it. But you build the actual code. You build the plan. The strategy may look different than when it was just Google and Stack Overflow—but the important thing is that you're doing it, playing with it, learning from it, and failing.
If you've been doing this a while: Your code review skills just became more valuable, not less. The world needs people who can tell the difference between working code and confident-looking bugs.
The productivity gap isn't about tools. It's about judgment. And judgment comes from experience.
What kinds of mistakes do you catch that only experience makes obvious? And how do you practice failing?