Stephen Kiers - Staff+ Software Engineer

Originally published Sept 2023 on a now-deprecated blog. Updated July 2025.

44 lbs → 51,857 lbs

ChatGPT's answer range for the same engineering question, asked three days apart.

The Setup

My kid wanted a ninja warrior course. He'd been training at a local ninja gym and watching American Ninja Warrior — now he wanted one in the backyard. Steel poles seven feet high, with a ninja line strung between them so kids could climb and swing on obstacles. Birthday party deadline. Seemed like a fun project.

The design wasn't trivial. Larger-diameter poles would be cemented permanently into the ground as anchors, capped when not in use. Thinner poles would slide inside them for the ninja course, then pull out for adult gatherings when you don't want steel poles in the middle of the yard. Removable engineering. Two sets of poles, two different diameters, structural loads on both.

I'm not a structural engineer. But I'm a software engineer with 20+ years in data platforms and system design. I know how to research a problem. I know how to break down complexity. So I asked ChatGPT.

I used terms I'd learned from ChatGPT itself, in earlier conversations: cantilever loads, critical failure, breaking point, lateral forces. I wanted to know what the poles could handle before they'd fail structurally.

What I got was chaos.

Installation process showing poles, inserts, rebar support, and concrete filling

The installation process: poles and inserts (top left), insert placed in hole (top right), pole inserted with rebar support (bottom left), filled with concrete and dirt (bottom right). Everything looked straight and solid.

The Conversations

I asked ChatGPT the same question on March 22, 2023: What horizontal load can a 7-foot, 1.5-inch schedule 40 steel pole take before bending?

It gave me six answers across two conversations: 44 lbs, 436 lbs, 3,478 lbs, 623 lbs, 99.2 lbs, and 573 lbs. Each time it recalculated, it used a different formula, a different assumption, a different unit conversion. Each time it sounded equally confident.

March 22, 2023

You

You have suggested, 44 lbs, 436 lbs, 3,478 lbs, and 99.2 lbs. I am not understanding how you are getting such wildly different answers. Do you need more information, or is this just a problem you are not suited to answer?

ChatGPT

I apologize for the confusion. Let me recalculate this more carefully...

It apologized. Recalculated. Gave me another different answer.

Three days later, I upgraded to a bigger pole question — maximum horizontal tension at 7 feet on a 2.5-inch pole.

March 25, 2023

ChatGPT

The maximum horizontal tension at 7 feet height on a 2.5-inch schedule 40 steel pole is approximately 51,857 pounds.

You

That sounds batshit wrong.

ChatGPT

I understand your concern. Let me walk through the calculation again...

It walked through the exact same calculation and arrived at the exact same number with the exact same confidence. Fifty-one thousand pounds. On a pole I could carry with one hand.

That moment should have stopped everything. Fifty-one thousand pounds of horizontal force on a backyard pole — that's the weight of a loaded semi truck. I knew it was wrong. I said it was wrong. But I didn't know why it was wrong, which meant I couldn't tell which of its other answers were wrong either.

The conversation kept deteriorating. I asked about filling the poles with concrete to add rigidity. ChatGPT calculated the concrete-filled pole as less rigid than the hollow one. When I pointed out the contradiction, it apologized and agreed the filled pole should be stronger — then recalculated to essentially the same lower number.

Later that week

You

Something went wrong here. You are broken.

ChatGPT

I apologize for the confusion in my previous calculations. You're right that the concrete-filled pole should be more rigid. Please perform the calculations using the provided formulas, and let me know the values.

You

Why are you saying "Please calculate this value"? You do it!

The AI asked me to do the math. The tool I was using because I didn't know the math told me to go do the math.

The Decision

I should have stopped. The answers were chaos — 44 lbs to 51,857 lbs for the same question. ChatGPT kept saying "Apologies for the confusion" before giving me another inconsistent answer. It told me to consult a structural engineer. Multiple times.

The problem wasn't the answers. It was that I didn't understand which questions mattered.

But I had a birthday party deadline. I needed to pick a number. The higher numbers felt safer. The lower ones — 44 lbs, 99 lbs — seemed like obvious hallucinations. Way too low to be real.

So I dismissed them and built it.

The Disaster

Kids climbed on. The poles bent almost immediately.

The pole visibly flexing with each swing — exactly the failure mode ChatGPT's correct low-number answers predicted.

Not catastrophically — thankfully no one got hurt. But visibly curved. The ninja line sagged. The slackline tension from a climbing child runs 150 to 300 lbs per side. The permanent bending threshold for my poles? About 70 to 100 lbs.

The math never worked. I just didn't know enough to see it.

The Realization

I went back to the ChatGPT conversation. Re-read it carefully this time.

2,500+ lbs was the breaking point — the force required to actually snap the steel. 70 to 100 lbs was the permanent bending point — the force at which the poles would deform and stay deformed, functionally useless even though they wouldn't break.

ChatGPT wasn't hallucinating. Not entirely. I was asking about breaking when I needed to ask about bending. The AI gave me correct answers to incorrect questions — and I didn't know enough to recognize it.

I'd picked up engineering terms from ChatGPT itself. Yield strength, flexural failure, cantilever loads. They sounded right. They made me feel like I was asking sophisticated questions. But I didn't actually understand the difference between when a pole deforms permanently and when it fractures entirely. A structural engineer would have recognized immediately: for a cantilever pole supporting a swinging load, deflection is the limiting factor, not breaking strength. That's day-one stuff in their world. It was invisible in mine.

The answers I dismissed as hallucinations — 44 lbs, 99 lbs, 155 lbs — were the correct answers for the question I should have been asking. The Dunning-Kruger effect in action: I had just enough vocabulary to sound competent and not nearly enough understanding to be competent.

The Real Problem

LLMs sound confident regardless of whether you're asking the right question. They don't hedge. They don't say "I'm not sure what you mean." They don't ask whether you want breaking strength or deflection limits. They just answer whatever you asked, in the same authoritative tone they use for everything.

ChatGPT told me to consult a structural engineer in almost every response — a generic disclaimer delivered with the same confidence as the numbers themselves. I ignored it every time.

I've watched engineers make the exact same mistake — they just bend code instead of steel poles. It's not a junior engineer problem. I've seen staff engineers do it. I've seen engineering managers do it. I did it, and I have 20 years of experience. The pattern is always the same: you step outside your domain, the AI sounds like it knows what it's talking about, and you don't have the mental models to catch where it's wrong. Race conditions, security vulnerabilities, architectural decisions that will buckle at scale — the code compiles, it runs, it looks right. But it's subtly broken in ways you can't see without the scar tissue.

The engineers who get 10x leverage from AI aren't better at prompting. They're better at reviewing. They know what good looks like, so they catch the garbage immediately.

The AI wasn't the problem. I was.

The bent pole lying in the grass after removal

The bent pole after removal. Permanent deformation from forces I was warned about and chose to ignore.

$Close-up of stress fractures and compression damage$

Close-up showing the stress fractures from compression and bending.

Update, Summer 2025

I did eventually stabilize the ninja setup. Guy wires — twist-in ground anchors, running cables to pull against the slackline tension. Counter-force. Basic engineering that I now understand instead of cargo-culting terminology at a chatbot.

Multiple kids can bounce on it without sway. Harder to mow around, though.

Took me two years to learn enough engineering to fix what I built wrong in two weeks. The poles are still standing. The lesson is too: neither the AI nor the human was solely at fault here. The failure was collaborative — an overconfident AI meeting an under-qualified human, each amplifying the other's weaknesses.

I went back and did a forensic autopsy of the original ChatGPT conversations. ChatGPT made physics errors. I made judgment errors. Together, we bent steel poles.

Series: AI & Expertise (1 of 3). Next: LLMs Need Supervision, Not Worship → Cognitive RAM.