762

Excerpt:

"Even within the coding, it's not working well," said Smiley. "I'll give you an example. Code can look right and pass the unit tests and still be wrong. The way you measure that is typically in benchmark tests. So a lot of these companies haven't engaged in a proper feedback loop to see what the impact of AI coding is on the outcomes they care about. Lines of code, number of [pull requests], these are liabilities. These are not measures of engineering excellence."

Measures of engineering excellence, said Smiley, include metrics like deployment frequency, lead time to production, change failure rate, mean time to restore, and incident severity. And we need a new set of metrics, he insists, to measure how AI affects engineering performance.

"We don't know what those are yet," he said.

One metric that might be helpful, he said, is measuring tokens burned to get to an approved pull request – a formally accepted change in software. That's the kind of thing that needs to be assessed to determine whether AI helps an organization's engineering practice.

To underscore the consequences of not having that kind of data, Smiley pointed to a recent attempt to rewrite SQLite in Rust using AI.

"It passed all the unit tests, the shape of the code looks right," he said. It's 3.7x more lines of code that performs 2,000 times worse than the actual SQLite. Two thousand times worse for a database is a non-viable product. It's a dumpster fire. Throw it away. All that money you spent on it is worthless."

All the optimism about using AI for coding, Smiley argues, comes from measuring the wrong things.

"Coding works if you measure lines of code and pull requests," he said. "Coding does not work if you measure quality and team performance. There's no evidence to suggest that that's moving in a positive direction."

top 50 comments
sorted by: hot top controversial new old
[-] Thorry@feddit.org 126 points 2 months ago

Yeah these newer systems are crazy. The agent spawns a dozen subagents that all do some figuring out on the code base and the user request. Then those results get collated, then passed along to a new set of subagents that make the actual changes. Then there are agents that check stuff and tell the subagents to redo stuff or make changes. And then it gets a final check like unit tests, compilation etc. And then it's marked as done for the user. The amount of tokens this burns is crazy, but it gets them better results in the benchmarks, so it gets marketed as an improvement. In reality it's still fucking up all the damned time.

Coding with AI is like coding with a junior dev, who didn't pay attention in school, is high right now, doesn't learn and only listens half of the time. It fools people into thinking it's better, because it shits out code super fast. But the cognitive load is actually higher, because checking the code is much harder than coming up with it yourself. It's slower by far. If you are actually going faster, the quality is lacking.

[-] Flames5123@sh.itjust.works 31 points 2 months ago

I code with AI a good bit for a side project since I need to use my work AI and get my stats up to show management that I’m using it. The “impressive” thing is learning new softwares and how to use them quickly in your environment. When setting up my homelab with automatic git pull, it quickly gave me some commands and showed me what to add in my docker container.

Correcting issues is exactly like coding with a high junior dev though. The code bloat is real and I’m going to attempt to use agentic AI to consolidate it in the future. I don’t believe you can really “vibe code” unless you already know how to code though. Stating the exact structures and organization and whatnot is vital for agentic AI programming semi-complex systems.

[-] chunkystyles@sopuli.xyz 24 points 2 months ago

This is very different from my experience, but I've purposely lagged behind in adoption and I often do things the slow way because I like programming and I don't want to get too lazy and dependent.

I just recently started using Claude Code CLI. With how I use it: asking it specific questions and often telling it exactly what files and lines to analyze, it feels more like taking to an extremely knowledgeable programmer who has very narrow context and often makes short-sighted decisions.

I find it super helpful in troubleshooting. But it also feels like a trap, because I can feel it gaining my trust and I know better than to trust it.

load more comments (1 replies)
[-] merc@sh.itjust.works 12 points 2 months ago

checking the code is much harder than coming up with it yourself

That's always been true. But, at least in the past when you were checking the code written by a junior dev, the kinds of mistakes they'd make were easy to spot and easy to predict.

LLMs are created in such a way that they produce code that genuinely looks perfect at first. It's stuff that's designed to blend in and look plausible. In the past you could look at something and say "oh, this is just reversing a linked list". Now, you have to go through line by line trying to see if the thing that looks 100% plausible actually contains a tiny twist that breaks everything.

load more comments (1 replies)
[-] DickFiasco@sh.itjust.works 90 points 2 months ago

AI is a solution in search of a problem. Why else would there be consultants to "help shepherd organizations towards an AI strategy"? Companies are looking to use AI out of fear of missing out, not because they need it.

[-] nucleative@lemmy.world 18 points 2 months ago

When I entered the workforce in the late '90s, people were still saying this about putting PCs on every employee's desk. This was at a really profitable company. The argument was they already had telephones, pen and paper. If someone needed to write something down, they had secretaries for that who had typewriters. They had dictating machines. And Xerox machines.

And the truth was, most of the higher level employees were surely still more profitable on the phone with a client than they were sitting there pecking away at a keyboard.

Then, just a handful of years later, not only would the company have been toast had it not pushed ahead, but was also deploying BlackBerry devices with email, deploying laptops with remote access capabilities to most staff, and handheld PDAs (Palm pilots) to many others.

Looking at the history of all of this, sometimes we don't know what exactly will happen with newish tech, or exactly how it will be used. But it's true that the companies that don't keep up often fall hopelessly behind.

[-] mycodesucks@lemmy.world 37 points 2 months ago

If AI is so good at what it does, then it shouldn't matter if you fall behind in adopting it... it should be able to pick up from where you need it. And if it's not mature, there's an equally valid argument to be made for not even STARTING adoption until it IS - early adopters always pay the most.

There's practically no situation where rushing now makes sense, even if the tech eventually DOES deliver on the promise.

load more comments (2 replies)
load more comments (4 replies)
load more comments (2 replies)
[-] CubitOom@infosec.pub 68 points 2 months ago

Generative models, which many people call "AI", have a much higher catastrophic failure rate than we have been lead to believe. It cannot actually be used to replace humans, just as an inanimate object can't replace a parent.

Jobs aren't threatened by generative models. Jobs are threatened by a credit crunch due to high interest rates and a lack of lenders being able to adapt.

"AI" is a ruse, a useful excuse that helps make people want to invest, investors & economists OK with record job loss, and the general public more susceptible to data harvesting and surveillance.

[-] jimmux@programming.dev 60 points 2 months ago

We never figured out good software productivity metrics, and now we're supposed to come up with AI effectiveness metrics? Good luck with that.

[-] Senal@programming.dev 20 points 2 months ago

Sure we did.

"Lines Of Code" is a good one, more code = more work so it must be good.

I recently had a run in with another good one : PR's/Dev/Month.

Not only it that one good for overall productivity, it's a way to weed out those unproductive devs who check in less often.

This one was so good, management decided to add it to the company wide catchup slides in a section espousing how the new AI driven systems brought this number up enough to be above other companies.

That means other companies are using it as well, so it must be good.

[-] SaharaMaleikuhm@feddit.org 18 points 2 months ago

Why is it always the dumbest people who become managers?

[-] yabbadabaddon@lemmy.zip 10 points 2 months ago

The others are busy working, they don't have time to waste drinking coffee with execs

[-] gravitas_deficiency@sh.itjust.works 47 points 2 months ago

Lmfao

Deeks said "One of our friends is an SVP of one of the largest insurers in the country and he told us point blank that this is a very real problem and he does not know why people are not talking about it more."

Maybe because way too many people are making way too much money and it underpins something like 30% of the economy at this point and everyone just keeps smiling and nodding, and they’re going to keep doing that until we drive straight off the fucking cliff 🤪

[-] AnUnusualRelic@lemmy.world 11 points 2 months ago

But who's making money? All the AI corps are losing billions, only the hardware vendors are making bank.

Makers of AI lose money and users of AI probably also lose since all they get is shit output that requires more work.

[-] python@lemmy.world 46 points 2 months ago

Recently had to call out a coworker for vibecoding all her unit tests. How did I know they were vibe coded? None of the tests had an assertion, so they literally couldn't fail.

[-] ch00f@lemmy.world 29 points 2 months ago

Vibe coding guy wrote unit tests for our embedded project. Of course, the hardware peripherals aren’t available for unit tests on the dev machine/build server, so you sometimes have to write mock versions (like an “adc” function that just returns predetermined values in the format of the real analog-digital converter).

Claude wrote the tests and mock hardware so well that it forgot to include any actual code from the project. The test cases were just testing the mock hardware.

[-] 87Six@lemmy.zip 18 points 2 months ago

Not realizing that should be an instant firing. The dev didn't even glance a look at the unit tests...

[-] nutsack@lemmy.dbzer0.com 9 points 2 months ago

if you reject her pull requests, does she fix it? is there a way for management to see when an employee is pushing bad commits more frequently than usual?

[-] urandom@lemmy.world 9 points 2 months ago

That's weird. I've made it write a few tests once, and it pretty much made them in the style of other tests in the repo. And they did have assertions.

load more comments (4 replies)
load more comments (2 replies)
[-] luciole@beehaw.org 46 points 2 months ago

This is all fine and dandy but the whole article is based on an interview with "Dorian Smiley, co-founder and CTO of AI advisory service Codestrap". Codestrap is a Palantir service provider, and as you'd expect Smiley is a Palantir shill.

The article hits different considering it's more or less a world devourer zealot taking a jab at competing world devourers. The reporter is an unsuspecting proxy at best.

[-] calliope@piefed.blahaj.zone 16 points 2 months ago* (last edited 2 months ago)

People will upvote anything if it takes a shot at AI. Even when the subtitle itself is literally an ad.

Codestrap founders say we need to dial down the hype and sort through the mess

The cult mentality is really interesting to watch.

Keep replying! Maybe this is a good honeypot for stupid people. “I hate you!!” Lmao

load more comments (2 replies)
[-] magiccupcake@lemmy.world 40 points 2 months ago

I love this bit especially

Insurers, he said, are already lobbying state-level insurance regulators to win a carve-out in business insurance liability policies so they are not obligated to cover AI-related workflows. "That kills the whole system," Deeks said. Smiley added: "The question here is if it's all so great, why are the insurance underwriters going to great lengths to prohibit coverage for these things? They're generally pretty good at risk profiling."

[-] melsaskca@lemmy.ca 38 points 2 months ago

Businesses were failing even before AI. If I cannot eventually speak to a human on a telephone then the whole human layer is gone and I no longer want to do business with that entity.

[-] Not_mikey@lemmy.dbzer0.com 35 points 2 months ago* (last edited 2 months ago)

Guy selling ai coding platform says other AI coding platforms suck.

This just reads like a sales pitch rather than journalism. Not citing any studies just some anecdotes about what he hears "in the industry".

Half of it is:

You're measuring the wrong metrics for productivity, you should be using these new metrics that my AI coding platform does better on.

I know the AI hate is strong here but just because a company isn't pushing AI in the typical way doesn't mean they aren't trying to hype whatever they're selling up beyond reason. Nearly any tech CEO cannot be trusted, including this guy, because they're always trying to act like they can predict and make the future when they probably can't.

[-] yabbadabaddon@lemmy.zip 15 points 2 months ago

My take exactly. Especially the bits about unit tests. If you cannot rely on your unit tests as a first assessment of your code quality, your unit tests are trash.

And not every company runs GitHub. The metrics he's talking about are DevOps metrics and not development metrics. For example In my work, nobody gives a fuck about mean time to production. We have a planning schedule and we need the ok from our customers before we can update our product.

[-] drmoose@lemmy.world 29 points 2 months ago

People delude themselves if they think LLMs are not useful for coding. People also delude themselves that all code will be AI written in the next 2 years. The reality is that it's incredibly useful tool but with reasonable limits.

load more comments (7 replies)
[-] turbofan211@lemmy.world 24 points 2 months ago

So is this just early adaptation problems? Or are we starting to find the ceiling for Ai?

[-] riskable@programming.dev 72 points 2 months ago

The "ceiling" is the fact that no matter how fast AI can write code, it still needs to be reviewed by humans. Even if it passes the tests.

As much as everyone thinks they can take the human review step out of the process with testing, AI still fucks up enough that it's a bad idea. We'll be in this state until actually intelligent AI comes along. Some evolution of machine learning beyond LLMs.

[-] otacon239@lemmy.world 63 points 2 months ago

We just need another billion parameters bro. Surely if we just gave the LLMs another billion parameters it would solve the problem…

[-] Thorry@feddit.org 43 points 2 months ago
[-] raman_klogius@ani.social 12 points 2 months ago* (last edited 2 months ago)

That's actually three 0s too short, at the very least

load more comments (2 replies)
[-] PancakesCantKillMe@lemmy.world 26 points 2 months ago

One smoldering Earth later….

[-] Technus@lemmy.zip 18 points 2 months ago

I realized the fundamental limitation of the current generation of AI: it's not afraid of fucking up. The fear of losing your job is a powerful source of motivation to actually get things right the first time.

And this isn't meant to glorify toxic working environments or anything like that; even in the most open and collaborative team that never tries to place blame on anyone, in general, no one likes fucking up.

So you double check your work, you try to be reasonably confident in your answers, and you make sure your code actually does what it's supposed to do. You take responsibility for your work, maybe even take pride in it.

Even now we're still having to lean on that, but we're putting all the responsibility and blame on the shoulders of the gatekeeper, not the creator. We're shooting a gun at a bulletproof vest and going "look, it's completely safe!"

[-] Feyd@programming.dev 14 points 2 months ago

fear of losing your job is a powerful source of motivation

I just feel good when things I make are good so I try to make them good. Fear is a terrible motivator for quality

[-] deadcream@sopuli.xyz 10 points 2 months ago

So you double check your work, you try to be reasonably confident in your answers, and you make sure your code actually does what it's supposed to do. You take responsibility for your work, maybe even take pride in it.

In my experience, around 50% of (professional) developers do not take pride in their work, nor do they care.

load more comments (3 replies)
[-] dadarobot@lemmy.ml 15 points 2 months ago

something i keep thinking about: is the electricity and water usage actually cheaper than a human? i feel like once the vc money dries up the whole thing will be incredibly unsustainable.

[-] CheeseNoodle@lemmy.world 26 points 2 months ago

Its early adoption problems in the same way as putting radium in toothpaste was. There are legitimate, already growing uses for various AI systems but as the technology is still new there's a bunch of people just trying to put it in everything, which is innevitably a lot of places where it will never be good (At least not until it gets much better in a way that LLMs fundementally never can be due to the underlying method by which they work)

load more comments (2 replies)
[-] Semi_Hemi_Demigod@lemmy.world 11 points 2 months ago* (last edited 2 months ago)

My job has me working on AI stuff and it reminds me a lot of Internet technology back in the 90s.

For instance: I’m creating a local model to integrate with our MCP server. It took a lot of fiddling with a Modelfile for it to use the tools the MCP has installed. And it needs 20GB of VRAM to give reasonably accurate responses.

The amount of fiddling and checking and rough edges feel like writing JavaScript 1.0, or the switchover to HTML4.

Companies get a lot of praise for having AI products, but the reality isn’t nearly as flashy as they make it out to be. I’m seeing some usefulness in it as I learn more, but it’s not nearly what the hype machine says.

load more comments (3 replies)
[-] SpaceNoodle@lemmy.world 9 points 2 months ago

Those of us with eyes have already seen the ceiling of currently available GenAI "solutions," which is synonymous with early adoption problems.

The technology will evolve, and the same basic problems will exist. The article has good points about how structured acceptance criteria will need to be more strictly enforced.

load more comments (32 replies)
[-] BrightCandle@lemmy.world 17 points 2 months ago

I keep trying to use the various LLMs that people recommend for coding for various tasks and it doesn't just get things wrong. I have been doing quite a bit of embedded work recently and some of the designs it comes up with would cause electrical fires, its that bad. Where the earlier versions would be like "oh yes that is wrong let me correct it..." then often get it wrong again the new ones will confidently tell you that you are wrong. When you tell them it set on fire they just don't change.

I don't get it I feel like all these people claiming success with them are just not very discerning about the quality of the code it produces or worse just don't know any better.

[-] Shayeta@feddit.org 11 points 2 months ago

It is possible to get good results, the problem is that you yourself need to have an very good understanding of the problem and how to solve it, and then accurately convey that to the AI.

Granted, I don't work on embedded and I'd imagine there's less code available for AI to train on than other fields.

[-] ironhydroxide@sh.itjust.works 10 points 2 months ago

Yes, I definitely want to train a new hire who is superlatively confident that they are correct, while also having to do my job correctly as well, while said new hire keeps putting shit in my work.

load more comments (1 replies)
[-] Malgas@beehaw.org 16 points 2 months ago

This feels like an exercise in Goodhart's Law: Any measure that becomes a target ceases to be a useful measure.

[-] btsax@reddthat.com 9 points 2 months ago* (last edited 2 months ago)

These are starting to feel like those headlines "this is finally the last straw for Trump!" I've been seeing since 2015

load more comments
view more: next ›
this post was submitted on 17 Mar 2026
762 points (100.0% liked)

Programming

27200 readers
299 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Rules

  • Follow the programming.dev instance rules
  • Keep content related to programming in some way
  • If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev



founded 3 years ago
MODERATORS