AI still doesn't work very well, businesses are faking it, and a reckoning is coming

Excerpt:

"Even within the coding, it's not working well," said Smiley. "I'll give you an example. Code can look right and pass the unit tests and still be wrong. The way you measure that is typically in benchmark tests. So a lot of these companies haven't engaged in a proper feedback loop to see what the impact of AI coding is on the outcomes they care about. Lines of code, number of [pull requests], these are liabilities. These are not measures of engineering excellence."

Measures of engineering excellence, said Smiley, include metrics like deployment frequency, lead time to production, change failure rate, mean time to restore, and incident severity. And we need a new set of metrics, he insists, to measure how AI affects engineering performance.

"We don't know what those are yet," he said.

One metric that might be helpful, he said, is measuring tokens burned to get to an approved pull request – a formally accepted change in software. That's the kind of thing that needs to be assessed to determine whether AI helps an organization's engineering practice.

To underscore the consequences of not having that kind of data, Smiley pointed to a recent attempt to rewrite SQLite in Rust using AI.

"It passed all the unit tests, the shape of the code looks right," he said. It's 3.7x more lines of code that performs 2,000 times worse than the actual SQLite. Two thousand times worse for a database is a non-viable product. It's a dumpster fire. Throw it away. All that money you spent on it is worthless."

All the optimism about using AI for coding, Smiley argues, comes from measuring the wrong things.

"Coding works if you measure lines of code and pull requests," he said. "Coding does not work if you measure quality and team performance. There's no evidence to suggest that that's moving in a positive direction."

AI still doesn't work very well, businesses are faking it, and a reckoning is coming

https://www.theregister.com/2026/03/17/ai_businesses_faking_it_reckoning_coming_codestrap/Open link View original on lemmy.ca

759

Comments163

Thorry

feddit.org

Yeah these newer systems are crazy. The agent spawns a dozen subagents that all do some figuring out on the code base and the user request. Then those results get collated, then passed along to a new set of subagents that make the actual changes. Then there are agents that check stuff and tell the subagents to redo stuff or make changes. And then it gets a final check like unit tests, compilation etc. And then it's marked as done for the user. The amount of tokens this burns is crazy, but it gets them better results in the benchmarks, so it gets marketed as an improvement. In reality it's still fucking up all the damned time.

Coding with AI is like coding with a junior dev, who didn't pay attention in school, is high right now, doesn't learn and only listens half of the time. It fools people into thinking it's better, because it shits out code super fast. But the cognitive load is actually higher, because checking the code is much harder than coming up with it yourself. It's slower by far. If you are actually going faster, the quality is lacking.

124

Flames5123 reply

sh.itjust.works

I code with AI a good bit for a side project since I need to use my work AI and get my stats up to show management that I’m using it. The “impressive” thing is learning new softwares and how to use them quickly in your environment. When setting up my homelab with automatic git pull, it quickly gave me some commands and showed me what to add in my docker container.

Correcting issues is exactly like coding with a high junior dev though. The code bloat is real and I’m going to attempt to use agentic AI to consolidate it in the future. I don’t believe you can really “vibe code” unless you already know how to code though. Stating the exact structures and organization and whatnot is vital for agentic AI programming semi-complex systems.

chunkystyles reply

sopuli.xyz

This is very different from my experience, but I've purposely lagged behind in adoption and I often do things the slow way because I like programming and I don't want to get too lazy and dependent.

I just recently started using Claude Code CLI. With how I use it: asking it specific questions and often telling it exactly what files and lines to analyze, it feels more like taking to an extremely knowledgeable programmer who has very narrow context and often makes short-sighted decisions.

I find it super helpful in troubleshooting. But it also feels like a trap, because I can feel it gaining my trust and I know better than to trust it.

merc reply

sh.itjust.works

checking the code is much harder than coming up with it yourself

That's always been true. But, at least in the past when you were checking the code written by a junior dev, the kinds of mistakes they'd make were easy to spot and easy to predict.

LLMs are created in such a way that they produce code that genuinely looks perfect at first. It's stuff that's designed to blend in and look plausible. In the past you could look at something and say "oh, this is just reversing a linked list". Now, you have to go through line by line trying to see if the thing that looks 100% plausible actually contains a tiny twist that breaks everything.

Shayeta reply

feddit.org

It's like guiding a coked up junior who can write 5000 wpm, has read every piece of documentation ever without understanding any of it.

DickFiasco

sh.itjust.works

AI is a solution in search of a problem. Why else would there be consultants to "help shepherd organizations towards an AI strategy"? Companies are looking to use AI out of fear of missing out, not because they need it.

ultimate_worrier reply

lemmy.dbzer0.com

Saledovil reply

sh.itjust.works

The problem is that code is hard to write. AI just doesn't solve it. This is opposite of crypto, where the product is sort of good at what it does, (not bitcoin, though), but we don't actually need to do that.

rekabis reply

lemmy.ca

AI is a solution in search of a problem.

The problem being CEOs asking themselves, “how do we acquire labour without having to pay for said labour, in order to maximize our own profit margins?”

AI was always meant to allow wealth to access labour without allowing labour to access wealth.

I, for one, am designing an entire production line of guillotines for when our capitalist system finally collapses. And for those in bunkers: a way of discovering air exchangers and all emergency exits so they can be filled with cement to turn bunkers into tombs. We need an effective method of culling sociopaths from our civilization, after all.