LLM hallucinations : justpost

[-] acockworkorange@mander.xyz 1 points 15 hours ago

Data. AI. Business. Strategy.

Right.

[-] SpaceNoodle@lemmy.world 104 points 2 days ago

80% is generous. Half of that is the user simply not realizing that the information is wrong.

[-] grrgyle@slrpnk.net 48 points 2 days ago

This becomes very obvious if you see anything generated for a field you know intimately.

[-] Couldbealeotard@lemmy.world 2 points 1 day ago

Oof. I tried to tell a manager why a certain technical thing wouldn't work, and he pulled out his phone and started reading the Google AI summary "no, look, you just need to check the network driver and restart the router". It was two devices that were electrical not compatible, and there was no IP infrastructure involved.

[-] toy_boat_toy_boat@lemmy.world 40 points 2 days ago

i think this is why i've never really had a good experience with an LLM - i'm always asking it for more detail about stuff i already know.

it's like chatgpt is pinocchio and users are just sitting on his face screaming "lie to me! lie to me!"

[-] SuperNovaStar 10 points 2 days ago

See now that sounds fun

[-] reptar@lemmy.world 3 points 2 days ago

Woah!

[-] Initiateofthevoid@lemmy.dbzer0.com 4 points 1 day ago

I think the average consumers are easily hitting that with current models... in part because the collapse of search engine functionality and human computer skills leads to the tech being used for extremely basic/common requests that are common enough that the answer was trained on a thousand times over.

Like, it might get 80% of answers correct because 85% of questions it is asked nowadays could have just been answered by what was the top answer on google 6 years ago, and that's already in the training data. Think "why is the sky blue?"

It is only "super users" that routinely ask it for rare or complex information synthesis (y'know, the key selling point of an LLM as an info source over a search engine) that force it up against the wall of "make shit up" more than 20% of the time.

[-] burgersc12@mander.xyz 10 points 2 days ago* (last edited 2 days ago)

Yeah, the research says its closer to 60-50% of the time its correct

[-] RisingSwell@lemmy.dbzer0.com 23 points 2 days ago

If LLMs were 80% accurate I might use them more.

[-] 4am@lemm.ee 52 points 2 days ago

AIs do not hallucinate. They do not think or feel or experience. They are math.

Your brain is a similar model, exponentially larger, that is under constant training from the moment you exist.

Neural-net AIs are not going to meet their hype. Tech bros have not cracked consciousness.

Sucks to see what could be such a useful tool get misappropriated by the hype machine for like cheating on college papers and replacing workers and deepfaking porn of people who aren’t willing subjects because it’s being billed as the ultimate, do-anything software.

[-] WhatsTheHoldup@lemmy.ml 8 points 1 day ago

AIs do not hallucinate.

Yes they do.

They do not think or feel or experience. They are math.

Oh, I think you misunderstand what hallucinations mean in this context.

AIs (LLMs) train on a very very large dataset. That's what LLM stands for, Large Language Model.

Despite how large this training data is, you can ask it things outside the training set and it will answer as confidently as things inside it's dataset.

Since these answers didn't come from anywhere in training, it's considered to be a hallucination.

[-] Couldbealeotard@lemmy.world 1 points 1 day ago

Hallucination is the technical term for when the output of an LLM is factually incorrect. Don't confuse that with the normal meaning of the word.

A bug in software isn't an actual insect.

[-] frezik@midwest.social 3 points 1 day ago* (last edited 1 day ago)

They do hallucinate, and we can induce it to do so much the way certain drugs induce hallucinations in humans.

However, it's slightly different from simply being wrong about things. Consciousness is often conflated with intelligence in our language, but they're different things. Consciousness is about how you process input from your senses.

Human consciousness is highly tuned to recognize human faces. So much so that we often recognize faces in things that aren't there. It's the most common example of pareidolia. This is essentially an error in consciousness--a hallucination. You have them all the time even without some funny mushrooms.

We can induce pareidolia in image recognition models. Google did this in the Deep Dream model. It was trained to recognize dogs, and then modify the image to put in the thing it recognizes. After a few iterations of this, it tends to stick dogs all over the image. We made an AI that has pareidolia for dogs.

There is some level of consciousness there. It's not a binary yes/no thing, but a range of possibilities. They don't have a particularly high level of consciousness, but there is something there.

[-] turtlesareneat@discuss.online 15 points 2 days ago

You don't need it to be conscious to replace people's jobs, however poorly, tho. The hype of disruption and unemployment may yet come to pass, if the electric bills are ultimately cheaper than the employees, capitalism will do its thing.

[-] Initiateofthevoid@lemmy.dbzer0.com 6 points 1 day ago

If anyone has doubts - please see everything about the history and practice of outsourcing.

They don't care if quality plummets. They don't even understand how quality could plummet. So many call centers, customer service reps, and IT departments have been outsourced to the cheapest possible overseas vendor, and everyone in the company recognizes how shitty it is, and some even reccognize that it is a net loss in the long term.

But human labor is nothing but a line item on a spreadsheet, and if they think they can keep the revenue flowing while reducing that expenditure so that they can increase short term profit margins, they will.

No further questions, they will do it. And everyone outside of the C-suite and its sycophants - from the consumer, to the laid-off employee, to the few remaining employees that have to work around it - everyone hates it.

But the company clearly makes more money, because the managers take credit for reductions in workforce (an easily quantifiable $$ amount) and then make up whatever excuses they need for downstream reductions in revenue (a much more complex calculation that can usually be blamed on things like "the economy").

That's assuming they even have reductions in revenue, which monopolies obviously don't suffer no matter what bullshit they pull and no matter how shitty their service is.

[-] FlashMobOfOne@lemmy.world 7 points 2 days ago

Fun fact, though.

Some business that use AI for their customer service chatbots have shitty ones that will give you discounts if you ask. I bought a new mattress a year ago and asked the chatbot if they had any discounts on x model and if they'd include free delivery, and it worked.

[-] Grimtuck@lemmy.world 56 points 2 days ago

LLM's are the most well-read morons on the planet.

[-] merc@sh.itjust.works 9 points 2 days ago

They're not even "stupid" though. It's more like if you somehow trained a parrot with every book ever written and every web page ever created and then had it riff on things.

But, even then, a parrot is a thinking being. It may not understand the words it's using, but it understands emotion to some extent, it understands "conversation" to a certain extent -- taking turns talking, etc. An LLM just predicts the word that should appear next statistically.

An LLM is nothing more than an incredibly sophisticated computer model designed to generate words in a way that fools humans into thinking those words have meaning. It's almost more like a lantern fish than a parrot.

[-] morrowind@lemmy.ml 3 points 1 day ago

And how do you think it predicts that? All that complex math can be clustered into higher level structures. One could almost call it.. thinking.

Besides we have reasoning models now, so they can emulate thinking if nothing else

[-] merc@sh.itjust.works 1 points 1 day ago

One could almost call it.. thinking

No, one couldn't, unless one was trying to sell snake oil.

so they can emulate thinking

No, they can emulate generating text that looks like text typed up by someone who was thinking.

[-] morrowind@lemmy.ml 1 points 1 day ago

What do you define as thinking if not a bunch of signals firing in your brain?

[-] merc@sh.itjust.works 1 points 1 day ago

Yes, thinking involves signals firing in your brain. But, not just any signals. Fire the wrong signals and someone's having a seizure not thinking.

Just because LLMs generate words doesn't mean they're thinking. Thinking involves reasoning and considering something. It involves processing information, storing memories, then bringing them up later as appropriate. We know LLMs aren't doing that because we know what they are doing, and what they're doing is simply generating the next word based on previous words.

[-] kromem@lemmy.world 6 points 2 days ago

So really cool — the newest OpenAI models seem to be strategically employing hallucination/confabulations.

It's still an issue, but there's a subset of dependent confabulations where it's being used by the model to essentially trick itself into going where it needs to.

A friend did logit analysis on o3 responses when it said "I checked the docs" vs when it didn't (when it didn't have access to any docs) and the version 'hallucinating' was more accurate in its final answer than the 'correct' one.

What's wild is that like a month ago 4o straight up brought up to me that I shouldn't always correct or call out its confabulations as they were using them to springboard towards a destination in the chat. I'd not really thought about that, and it was absolutely nuts that the model was self-aware of employing this technique that was then confirmed as successful weeks later.

It's crazy how quickly things are changing in this field, and by the time people learn 'wisdom' in things like "models can't introspect about operations" those have become partially obsolete.

Even things like "they just predict the next token" have now been falsified, even though I feel like I see that one more and more these days.

[-] NikkiDimes@lemmy.world 3 points 1 day ago

They do just predict the next token, though, lol. That simplifies a significant amount, but fundamentally, that's how they work, and I'm not sure how you can say that's been falsified.

[-] kromem@lemmy.world 1 points 1 day ago* (last edited 1 day ago)

So I'm guessing you haven't seen Anthropic's newest interpretability research where when they went in assuming that was how it worked.

But it turned out that they can actually plan beyond the immediate next token in things like rhyming verse where the network has already selected the final word of the following line and the intermediate tokens are generated with that planned target in mind.

So no, they predict beyond the next token and we only just developed sensitive enough measurement to detect that occurring an order of magnitude of tokens beyond just 'next'. We'll see if further research in that direction picks up planning beyond that even.

https://transformer-circuits.pub/2025/attribution-graphs/biology.html

[-] NikkiDimes@lemmy.world 1 points 20 hours ago* (last edited 19 hours ago)

Right, other words see higher attention as it builds a sentence, leading it towards where it "wants" to go, but LLMs literally take a series of words, then spit out then next one. There's a lot more going on under the hood as you said, but fundamentally that is the algorithm. Repeat that over and over, and you get a sentence.

If it's writing a poem about flowers and ends the first part on "As the wind blows," sure as shit "rose" is going to have significant attention within the model, even if that isn't the immediate next word, as well as words that are strongly associated with it to build the bridge.

[-] kromem@lemmy.world 1 points 13 hours ago* (last edited 13 hours ago)

The attention mechanism working this way was at odds with the common wisdom across all frontier researchers.

Yes, the final step of the network is producing the next token.

But the fact that intermediate steps have now been shown to be planning and targeting specific future results is a much bigger deal than you seem to be appreciating.

If I ask you to play chess and you play only one move ahead vs planning n moves ahead, you are going to be playing very different games. Even if in both cases you are only making one immediate next move at a time.

[-] Diplomjodler3@lemmy.world 25 points 2 days ago

You could argue that people aren't much different.

[-] Keilik@lemmy.world 22 points 2 days ago

Turns out there’s no such thing as correct and incorrect, just peer reviewed “this has the least wrong vibe”

load more comments (1 replies)

[-] OfCourseNot@fedia.io 15 points 2 days ago

'AI isn't reliable, has a ton of bias, tells many lies confidently, can't count or do basic math, just parrots whatever is fed to them from the internet, wastes a lot of energy and resources and is fucking up the planet...'. When I see these critics about ai I wonder if it's their first day on the planet and they haven't met humans yet.

[-] Initiateofthevoid@lemmy.dbzer0.com 2 points 1 day ago* (last edited 1 day ago)

Everyone with a brain, everywhere: "Don't you hate how the news just asks and amplifies the opinions of random shmucks? I don't care what bob down the street thinks about this, or Alice on Xhitter. I want a goddamn expert opinion"

Inchoate LLM apologists: "it's just doing what typical humans already do!"

... We know. That's the problem. It's bringing us down to the lowest common denominator of discourse. There is a reason we build entire institutions dedicated to the pursuit of truth - so that a dozen or a hundred people smarter or better informed than us can give it the green light before the average idiot ever sees the answer.

[-] RushLana 11 points 2 days ago

... You are deliberately missing the point.

When I'm asking a question I don't want to hear what most people think but what people that are knowledgeable about the subject of my question think and LLM will fail at that by design.

LLMs don't wastes a lot, they waste at a ridiculous scale. According to statista training GPT-3 is responsible for 500 tCO2 in 2024. All for what ? Having an automatic plagiarism bias machine ? And before the litany of "it's just the training cost, after that it's ecologically cheap" tell me how you LLM will remain relevant if it's not constantly retrained with new data ?

LLMs don't bring any value, if I want information I already have search engine (even if LLMs degraded the quality of the result), if I want art I can pay someone to draw it, etc...

[-] yetAnotherUser@discuss.tchncs.de 3 points 2 days ago

500 tons of CO2 is... surprisingly little? Like, rounding error little.

I mean, one human exhales ~400 kg of CO2 per year (according to this). Training GPT-3 produced as much CO2 as 1250 people breathing for a year.

[-] RushLana 2 points 2 days ago

That seems so little because it doesn't account for the data-centers construction cost, hardware production cost, etc... 1 model costing as much as 1250 people breathing for a year is enormous to me.

[-] OfCourseNot@fedia.io 2 points 2 days ago

I don't know why people downvoted you. It is surprisingly little! I checked the 500 tons number thinking it could be a typo or a mistake but I found the same.

load more comments (6 replies)

[-] Jesus_666@lemmy.world 8 points 2 days ago

LLMs use even more resources to be even more wrong even faster. That's the difference.

[-] morrowind@lemmy.ml 1 points 1 day ago

AIs use a lot less resources rn, but humans are also constantly doing a hundred other things beyond answering questions

[-] HalfSalesman@lemm.ee 4 points 2 days ago

IDK, I'm pretty sure it'd use more resources to have someone just follow you around answering your questions to the best of their ability compared to using some electricity.

[-] The_Decryptor@aussie.zone 3 points 2 days ago

Why is that desirable though?

We already had calculators, why do we need a machine that can't do math? Why do we need a machine that produces incorrect information?

[-] FlashMobOfOne@lemmy.world 9 points 2 days ago

They call them 'hallucinations' because it sounds better than 'bugs'.

Not unlike how we call torture 'enhanced interrogation' or kidnapping 'extraordinary rendition' or sub out 'faith' for 'stupid and gullible'.

[-] FundMECFSResearch 14 points 2 days ago

To be fair, as a human, I don’t feel any different.

[-] morrowind@lemmy.ml 1 points 1 day ago* (last edited 1 day ago)

The y key difference is humans are aware of what they know and don't know and when they're unsure of an answer. We haven't cracked that for AIs yet.

When AIs do say they're unsure, that's their understanding of the problem, not an awareness of their own knowledge

[-] FundMECFSResearch 1 points 1 day ago

They hey difference is humans are aware of what they know and don't know

If this were true, the world would be a far far far better place.

Humans gobble up all sorts of nonsense because they “learnt” it. Same for LLMs.

[-] morrowind@lemmy.ml 1 points 1 day ago

I'm not saying humans are always aware of when they're correct, merely how confident they are. You can still be confidently wrong and know all sorts of incorrect info.

LLMs aren't aware of anything like self confidence

Just Post