Why LLMs can't really build software : technology

[-] black_flag@lemmy.dbzer0.com 134 points 9 months ago

I think it's going to require a change in how models are built and optimized. Software engineering requires models that can do more than just generate code.

You mean to tell me that language models aren't intelligent? But that would mean all these people cramming LLMs in places where intelligence is needed are wasting their time?? Who knew?

Me.

[-] eager_eagle@lemmy.world 46 points 9 months ago

I have a solution for that, I just need a small loan of a billion dollars and 5 years. #trustmebro

[-] black_flag@lemmy.dbzer0.com 17 points 9 months ago

Only one billion?? What a deal! Where's my checkbook!?

[-] TuffNutzes@lemmy.world 110 points 9 months ago

The LLM worship has to stop.

It's like saying a hammer can build a house. No, it can't.

It's useful to pound in nails and automate a lot of repetitive and boring tasks but it's not going to build the house for you - architect it, plan it, validate it.

It's similar to the whole 3D printing hype. You can 3D print a house! No you can't.

You can 3D print a wall, maybe a window.

Then have a skilled Craftsman put it all together for you, ensure fit and finish and essentially build the final product.

[-] natecox@programming.dev 20 points 9 months ago* (last edited 6 months ago)

[This comment has been deleted by an automated system]

[-] TuffNutzes@lemmy.world 40 points 9 months ago

Yeah I've seen that before and it's basically what I'm talking about. Again, that's not "printing a 3D house" as hype would lead one to believe. Is it extruding cement to build the walls around very carefully placed framing and heavily managed and coordinated by people and finished with plumbing, electrical, etc.

It's cool that they can bring this huge piece of equipment to extrude cement to form some kind of wall. It's a neat proof of concept. I personally wouldn't want to live in a house that looked anything like or was constructed that way. Would you?

[-] natecox@programming.dev 16 points 9 months ago* (last edited 6 months ago)

[This comment has been deleted by an automated system]

[-] DireTech@sh.itjust.works 17 points 9 months ago

Did you see another video about this? The one linked only showed the walls and still showed them doing interior framing. Nothing about windows, electrical, plumbing, insulation, etc.

What they showed could speed up construction but there are tons of other steps involved.

I do wonder how sturdy it is since it doesn’t look like rebar or anything else is added.

load more comments (6 replies)

[-] poopkins@lemmy.world 4 points 9 months ago

Spoken like a person who has never been involved in the construction of a home. It's effectively doing the job of (poorly) pouring concrete which isn't the difficult or time consuming part.

load more comments (2 replies)

load more comments (1 replies)

[-] frog_brawler@lemmy.world 6 points 9 months ago

You’re making a great analogy with the 3D printing of a house.

However, if we consider the 3D printed house scenario; that skilled craftsman is now able to do things on his own that he would have needed a team for in the past. Most, if not all, of the less skilled members of that team are not getting any experience within the craft at that point. They’re no longer necessary when one skilled person can now do things on their own.

What happens when the skilled and highly experienced craftsmen that use AI as a supplemental tool (and subsequently earn all the work) eventually retire, and there’s been no juniors or mid-levels for a while? No one is really going to be qualified without having had exposure to the trade for several years.

[-] TuffNutzes@lemmy.world 5 points 9 months ago

Absolutely. This is a huge problem and I've read about this very problem from a number of sources. This will have a huge impact on engineering and information work.

Interestingly enough, A similar shortage occurred in the trades when information work was up and coming and the trades were shunned as a career path for many. Now we don't have enough plumbers and electricians. Trades are now finding their the skills in high demand and charging very high rates.

[-] ChokingHazard@lemmy.world 5 points 9 months ago

The trades problem is a typical small business problem with toxic work environments. I knew plenty that washed out of the trades because of that. The “nobody wants to work anymore” tradesmen but really it’s “nobody wants to work with me for what I’m willing to pay”

load more comments (1 replies)

load more comments (5 replies)

[-] isaaclyman@lemmy.world 34 points 9 months ago

Clearly LLMs are useful to software engineers.

Citation needed. I don’t use one. If my coworkers do, they’re very quiet about it. More than half the posts I see promoting them, even as “just a tool,” are from people with obvious conflicts of interest. What’s “clear” to me is that the Overton window has been dragged kicking and screaming to the extreme end of the scale by five years of constant press releases masquerading as news and billions of dollars of market speculation.

I’m not going to delegate the easiest part of my job to something that’s undeniably worse at it. I’m not going to pass up opportunities to understand a system better in hopes of getting 30-minute tasks done in 10. And I’m definitely not going to pay for the privilege.

[-] Feyd@programming.dev 14 points 9 months ago

I don't use one, and my coworkers that do use them are very loud about it, and worse at their jobs than they were a year ago.

[-] hisao@ani.social 10 points 9 months ago

If my coworkers do, they’re very quiet about it.

Gee, guess why. Given the current culture of hate and ostracism I would never outright say IRL that I like it or use it a lot. I would say something like "yeah, I think it can sometimes be useful when used carefully and I sometimes use it too". While in reality it would mean that it actually writes 95% of code under my micromanagement.

[-] Feyd@programming.dev 11 points 9 months ago

Wut. At software shops the prevailing atmosphere is that you should use it and broadcast it as much as possible. This person's experience is not normal

load more comments (1 replies)

[-] skisnow@lemmy.ca 7 points 9 months ago

I've found them useful, sometimes, but nothing like a fraction of what the hype would suggest.

They're not adequate replacements for code reviewers, but getting an AI code review does let me occasionally fix a couple of blunders before I waste another human's time with them.

I've also had the occasional bit of luck with "why am I getting this error" questions, where it saved me 10 minutes of digging through the code myself.

"Create some test data and a smoke test for this feature" is another good timesaver for what would normally be very tedious drudge work.

What I have given up on is "implement a feature that does X" questions, because it invariably creates more work than it saves. Companies selling "type in your app idea and it'll write the code" solutions are snake-oil salesman.

[-] frog_brawler@lemmy.world 6 points 9 months ago* (last edited 9 months ago)

I’m not a “software engineer” but a lot of people that don’t work within tech would probably call me one.

I’m in Cloud Engineering, but came from the sys/network admin and ops side of things rather than starting off in dev or anything like that.

Up until about 5 years ago, I really only knew Powershell and a little bit of bash. I’ve gotten up to speed in a lot of things but never officially learned python, js, go or any other real development language that would be useful to me. I’ve spent way more time focusing on getting good with IaC, and probably more of the SRE type stuff.

In my particular situation, LLMs are incredibly useful. It’s fair to say that I use them daily now. I’ve had it convert bash scripts to python for me very quickly. I don’t know python but now that I’m able to look at a python script next to my bash; I’m picking up on stuff a lot faster. I’m using Lambda way more often as a result.

Also, there’s a lot of mundane filling out forms shit that I delegate to an LLM. I don’t want to spend my time filling out a form that I know no one is actually going to read. F it, I’ll have the AI write a report for an AI. It’s dumb as shit, but that’s the world today.

[-] Aatube@kbin.melroy.org 5 points 9 months ago

https://survey.stackoverflow.co/2025/ai/

47% daily use

[-] mojofrododojo@lemmy.world 6 points 9 months ago

47% daily use

That is NOT what that says. It says 47% of STACK OVERFLOW RESPONDENTS REPORT using AI. That does not represent 47% of devs.

If you go to 4chan and poll of chuds, you're going to get a high percentage of respondents affirming your query. You went to stackoverflow and asked about AI. Think about the user base.

load more comments (1 replies)

load more comments (2 replies)

[-] frezik 33 points 9 months ago

To those who have played around with LLM code generation more than me, how are they at debugging?

I'm thinking of Kernighan's Law: "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." If vibe coding reduces the complexity of writing code by 10x, but debugging remains just as difficult as before, then Kernighan's Law needs to be updated to say debugging is 20x as hard as vibe coding. Vibe coders have no hope of bridging that gap.

[-] Ledivin@lemmy.world 25 points 9 months ago

They're not good at debugging. The article is pretty spot on, IMO - they're great at doing the work; but you are still the brain. You're still deciding what to do, and maybe 50% of the time how to do it, you're just not executing the lowest level anymore. Similar for debugging - this is not an exercise at the lowest level, and needs you to run it.

[-] hisao@ani.social 7 points 9 months ago

deciding what to do, and maybe 50% of the time how to do it, you’re just not executing the lowest level anymore

And that's exactly what I want. And I don't get it why people want more. Having more means you have less and less control or influence on the result. What I want is that in other fields it becomes like it is in programming now, so that you micromanage every step and have great control over the result.

[-] very_well_lost@lemmy.world 17 points 9 months ago* (last edited 9 months ago)

The company I work for has recently mandated that we must start using AI tools in our workflow and is tracking our usage, so I've been experimenting with it a lot lately.

In my experience, it's worse than useless when it comes to debugging code. The class of errors that it can solve is generally simple stuff like typos and syntax errors — the sort of thing that a human would solve in 30 seconds by looking at a stack trace. The much more important class of problem, errors in the business logic, it really really sucks at solving.

For those problems, it very confidently identifies the wrong answer about 95% of the time. And if you're a dev who's desperate enough to ask AI for help debugging something, you probably don't know what's wrong either, so it won't be immediately clear if the AI just gave you garbage or if its suggestion has any real merit. So you go check and manually confirm that the LLM is full of shit which costs you time... then you go back to the LLM with more context and ask it to try again. It's second suggestion will sound even more confident than the first, ("Aha! I see the real cause of the issue now!") but it will still be nonsense. You go waste more time to rule out the second suggestion, then go back to the AI to scold it for being wrong again.

Rinse and repeat this cycle enough times until your manager is happy you've hit the desired usage metrics, then go open your debugging tool of choice and do the actual work.

[-] HubertManne@piefed.social 10 points 9 months ago

maybe its just me but I find typos to be the most difficult because my brain and easily see it as correct so the whole code looks correct. Its like the way you can take the vowels out of sentences and people can still immediately read it.

load more comments (5 replies)

[-] HarkMahlberg@kbin.earth 9 points 9 months ago

we must start using AI tools in our workflow and is tracking our usage

Reads to me as "Please help us justify the very expensive license we just purchased and all the talented engineers we just laid off."

I know the pain. Leadership's desperation is so thick you can smell it. They got FOMO'd, now they're humiliated, so they start lashing out.

load more comments (1 replies)

[-] trublu@lemmy.dbzer0.com 7 points 9 months ago

As it seems to be the case in all of these situations, AI fails hard at tasks when compared to tools specifically designed for that task. I use Ruff in all my Python projects because it formats my code and finds (and often fixes) the kind of low complexity/high probability problems that are likely to pop up as a result of human imperfection. It does it with great accuracy, incredible speed, using very little computing resources, and provides levels of safety in automating fixes. I can run it as an automation step when someone proposes code changes, adding all of 3 or 4 seconds to the runtime. I can run it on my local machine to instantly resolve my ID10T errors. If AI can't solve these problems as quickly, and if it can't solve anything more complicated reliably, I don't understand why it would be a tool I would use.

[-] Pechente@feddit.org 14 points 9 months ago

Definitely not good. Sometimes they can solve issues but you gotta point them in the direction of the issue. Other times they write hacky workarounds that do the job for the moment but crash catastrophically with the next major dependency update.

[-] HarkMahlberg@kbin.earth 13 points 9 months ago

I saw an LLM override the casting operator in C#. An evangelist would say "genius! what a novel solution!" I said "nobody at this company is going to know what this code is doing 6 months from now."

It didn't even solve our problem.

load more comments (20 replies)

[-] 0x01@lemmy.ml 10 points 9 months ago

I use it extensively daily.

It cannot step through code right now, so true debugging is not something you use it for. Most of the time the llm will take the junior engineer approach of "guess and check" unless you explicitly give it better guidance.

My process is generally to start with unit tests and type definitions, then a large multipage prompt for every segment of the app the llm will be tasked with. Then I'll make a snapshot of the code, give the tool access to the markdown prompt, and validate its work. When there are failures and the project has extensive unit tests it generally follows the same pattern of "I see that this failure should be added to the unit tests" which it does and then re-executes them during iterative development.

If tests are not available or if it is not something directly accessible to the tool then it will generally rely on logs either directly generated or provided by the user.

My role these days is to provide long well thought out prompts, verify the integrity of the code after every commit, and generally just kind of treat the llm as a reckless junior dev. Sometimes junior devs can surprise you, like yesterday I was very surprised by a one shot result: asking for a mobile rn app for taking my rambling voice recordings and summarize them into prompts, it was immediately remarkably successful and now I've been walking around mic'd up to generate prompts.

[-] hisao@ani.social 4 points 9 months ago

My first level of debugging is logging things to console. LLMs here do a decent job at "reading your mind" and autocompleting "pri" into something like "println!("i = {}, x = {}, y = {}", i, x, y);" with very good context awareness of what and how exactly it makes most sense to debug print in the current location in code.

load more comments (4 replies)

[-] dantheclamman@lemmy.world 19 points 9 months ago

LLMs are useful to provide generic examples of how a function works. This is something that would previously take an hour of searching the docs and online forums, but the LLM can do for very quickly, and I appreciate. But I have a library I want to use that was just updated with entirely new syntax. The LLMs are pretty much useless for it. Back to the docs I go! Maybe my terrible code will help to train the model. And in my field (marine biogeochemistry), the LLM generally cannot understand the nuances of what I'm trying to do. Vibe coding is impossible. And I doubt the training set will ever be large or relevant enough for the vibe coding to be feasible.

load more comments (6 replies)

[-] Wispy2891@lemmy.world 14 points 9 months ago

Note: this comes from someone that makes a (very good) ide which they only monetize with an AI subscription so it's interesting to see their take

(They use Claude opus like all the others so the results are similar)

load more comments (1 replies)

[-] hisao@ani.social 12 points 9 months ago

I love it how article baits AI-haters to upvote it, even though it's very clearly pro-AI:

At Zed we believe in a world where people and agents can collaborate together to build software. But, we firmly believe that (at least for now) you are in the drivers seat, and the LLM is just another tool to reach for.

[-] Aatube@kbin.melroy.org 9 points 9 months ago

How is that pro-AI? It clearly very neutrally says it’s just a tool, which you can also hate.

[-] humanspiral@lemmy.ca 11 points 9 months ago* (last edited 2 months ago)

[removed by mod]

[-] antihumanitarian@lemmy.world 11 points 9 months ago

LLMs have made it really clear when previous concepts actually grouped things that were distinct. Not so long ago, Chess was thought to be uniquely human, until it wasn't, and language was thought to imply intelligence behind it, until it wasn't.

So let's separate out some concerns and ask what exactly we mean by engineering. To me, engineering means solving a problem. For someone, for myself, for theory, whatever. Why do we want to solve the problem, what we want to do to solve the problem, and how we do that often blurred together. Now, AI can supply the how in abundance. Too much abundance, even. So humans should move up the stack, focus on what problem to solve and why we want to solve it. Then, go into detail to describe what that solution looks like. So for example, making a UI in Figma or writing a few sentences on how a user would actually do the thing. Then, hand that off to the AI once you think it's sufficiently defined.

The author misses a step in the engineering loop that's important though. Plans almost always involve hidden assumptions and undefined or underdefined behavior that implementation will uncover. Even more so with AI, you can't just throw a plan and expect good results, the humans need to come back, figure out what was underdefined or not actually what they wanted, and update the plan. People can 'imagine' rotating an apple in their head, but most of them will fail utterly if asked to draw it; they're holding the idea of rotating an apple, not actually rotating the apple, and application forces realization of the difference.

load more comments (1 replies)

[-] Nighed@feddit.uk 8 points 9 months ago

The idea of the mental model CAN be done by AI.

In my experience, if you get it to build a requirements doc first, then ask it to implement that while updating it as required (effectively it's mental state). you will get a pretty good output with decent 'debugging' ability.

This even works ok with the older 'dumber' models.

That only works when you have a comprehensive set of requirements available though. It works when you want to add a new screen/process (mostly) but good luck updating an existing one! (I haven't tried getting it to convert existing code to a requirements doc - anyone tried that?)

[-] flop_leash_973@lemmy.world 6 points 9 months ago

I tried feeding ChatGPT a Terraform codebase once and asked it to produce an architecture diagram of what the code base would deploy to AWS.

It got most of the little blocks right for the services that would get touched. But the layout and traffic direction flow between services was nonsensical.

Truth be told it did do a better job than I thought it would initially.

load more comments (2 replies)

[-] PixelatedSaturn@lemmy.world 8 points 9 months ago

Good article, I couldn't agree with it more, it's exactly my experience.

The tech is being developed really fast and that is the main issue when taking about ai. Most ai haters are using the issues we might have today to discredit the while technology which makes no sense to me.

And this issue the article talks about is apparent and whoever solves it will be rich.

However, it's interesting to think about the issues that come next.

[-] HarkMahlberg@kbin.earth 10 points 9 months ago

It's true, the tech will get better in the future, we just need to believe and trust the plan.

Same thing with crypto and NFT's. They were 99% scam by volume, but who wouldn't love moving their life savings into a digital ecosystem controlled by a handful of rich gambling addicts with no consumer protections? Imagine, you'll never need to handle dirty paper money ever again, we'll just put it all in a digital wallet somewhere controlled by someone else coughmastercardcough.

And another thing, we were too harsh on the Metaverse. Sure, spending 8 hours in VR could make you vomit, and the avatars made ET for the Atari look like Uncharted 4, but it was just in its infancy!

I too want to outsource all my critical thinking to a chatbot controlled by an wealthy insular narcissist who throws Nazi salutes. The technology just needs time to mature. Who knows, maybe it can automate the exile of birthright citizens for us too!

/s

[-] PixelatedSaturn@lemmy.world 5 points 9 months ago

That's exactly the hyperbole I was talking about. Your post is full of obvious fallacies, but the fact that you are pushing everything to the absolutes is the silliest one.

load more comments (2 replies)

load more comments (6 replies)

[-] wulrus@lemmy.world 5 points 9 months ago* (last edited 9 months ago)

Interesting what he wrote about LLMs' inability to "zoom out" and see the whole picture. I use Gemini and ChatGPT sometimes to help debug admin / DevOps problems. It's a great help for extra input, a bit like rubberducking on steroids.

Examples how it went:

Problem: Apache-cluster and connected KeyCloak-Cluster, odd problems with loginflow. Reducing KeyCloak to 1 node solves it, so it says that we need to debug node communication and how to set the debug log settings. A lot of analysis together. But after a while, it's pretty obvious that the Apache-cluster doesn't use the sticky session correctly and forwards requests to the wrong KeyCloak node in the middle of the login flow. LLM does not see that, wanted to continue to dig deeper and deeper into supposedly "odd" details of the communication between KeyCloak nodes, althought the combined logs of all nodes show that the error was in load balancing.

Problem: Apache from a different cluster often returns 413 (payload too large). Indeed it happens with pretty large requests, the limit where it happens is a big over 8kB without the body. But the incoming request is valid. So I ask both Gemini and ChatGPT for a complete list of things that cause Apache to do that. It does a decent job at that. And one of it is close: It says to check for mod_proxy_ajp use, since that observed limit could be caused by trying to make an AJP package to communicate with backchannel servers. It was not the cause; the actual mod was mod_jk, which also uses AJP. It helped me focus on watching out for anything using AJP when reviewing the whole config manually, so I found it, and the "rubberducking" helped indirectly. But the LLM said we must forget about AJP and focus on other possible causes - a dead end. When I told it the solution, it was like: Of course mod_jk. (413 sounds like the request TO the apache is wrong, but actually, it tries internally to create an invalid AJP package over 8kB, and when it fails blames the incoming request.)

Technology

Our Rules

Approved Bots