333

AI Search Engines Are Confidently Wrong Too Often. (www.seroundtable.com)

submitted 1 day ago by Tea@programming.dev to c/technology@lemmy.world

31 comments fedilink hide all child comments

A new study from Columbia Journalism Review showed that AI search engines and chatbots, such as OpenAI's ChatGPT Search, Perplexity, Deepseek Search, Microsoft Copilot, Grok and Google's Gemini, are just wrong, way too often.

top 31 comments

sorted by: hot top controversial new old

[-] MonkderVierte@lemmy.ml 4 points 7 hours ago* (last edited 7 hours ago)

In all seriousness; studies are the first step to general knowledge outside professional circles and by extension, legislation made on it.

[-] criitz@reddthat.com 46 points 1 day ago* (last edited 23 hours ago)

When LLMs are wrong they are only confidently wrong. They don't know any other way to be wrong.

[-] 4am@lemm.ee 24 points 21 hours ago

They do not know wright from wrong, they only know probability of the next word.

LLMs are a brute forcing of the immigration of intelligence. They do not think, they are not intelligent.

But I mean people today believe that 5G vaccines made the frogs gay.

[-] kubica@fedia.io 7 points 1 day ago

We only notice when they are wrong, but they can also be right just by accident.

[-] isaaclyman@lemmy.world 5 points 18 hours ago

It’s all hallucinations. It’s just that some of them happen to be right

[-] Imgonnatrythis@sh.itjust.works 3 points 22 hours ago

This does seem to be exactly the problem. It is solvable, but I haven't seen any that do it. They should be able to calculate a confidence value based on number of corresponding sources, quality ranking of sources, and how much interpolation of data is being done vs. Straightforward regurgitation of facts.

[-] TaviRider@reddthat.com 4 points 17 hours ago

I haven’t seen any evidence that this is solvable. You can feed in more training data, but that doesn’t mean generative AI technology is capable of using that in the way you describe.

[-] xthexder@l.sw0.com 2 points 19 hours ago

I've been saying this for a while. They need to train it to be able to say "I don't know". They need to add questions to the dataset without enough information to solve so that it can understand what is/isn't facts vs hallucinating

[-] TommySoda@lemmy.world 51 points 1 day ago

I miss the days when Google would just give a snippet of a Wikipedia article at the top and you just click the "read more" button. It may not have been exactly what you were looking for but at least it wasn't blatantly wrong. Nowadays you have to almost scroll down to the bottom just to find something relevant.

[-] bdullanw@lemm.ee 6 points 1 day ago

i almost think this is getting worse as the internet grows, there’s so much more information out there now and it’s easier and easier to push content further. i’m not surprised it’s more and more difficult to filter through the bs

[-] njordomir@lemmy.world 5 points 12 hours ago

To add to this. While there is "more information" that information is increasingly locked down and unsearchable. Things that used to be easy to find are now hidden in the walled gardens of sites like Facebook, X (Formerly Twitter), etc. Google Search and similar engines basically only searche ads now as everything else is locked down. It's an internet full of data... that we can't easily access.

[-] gravitas_deficiency@sh.itjust.works 25 points 1 day ago

I’m confidently wrong a lot of the time too. But I mainly do that just to fuck with people.

[-] DioEgizio@lemm.ee 19 points 1 day ago

LLMs hallucinate. In other words, water is wet

[-] SlopppyEngineer@lemmy.world 13 points 22 hours ago

They are in the end BS generation machines that are trained so much they accidentally happen to be right often enough.

[-] OhVenus_Baby@lemmy.ml 7 points 23 hours ago

They get you 80 to 90 percent close to generally solving most problems asked. Sure they need fact checked as any info does. They are of major use in all areas of study and life. Just not the god everyone wants it to be.

[-] LadyAutumn 13 points 22 hours ago* (last edited 11 hours ago)

They are less useful than a Wikipedia search and a dictionary. They can functionally replace humans in 0 fields that were not already automatable by machines. They are useless in any situation that warrants any degree of caution about safety.

85-90% is way over-estimated, it gets significantly worse dealing with specific tasks. And even if it was 85-90%, that's not good enough, even remotely, for just about anything. Humans make errors too, but inconsistently and inversely proportional to experience. This makes no difference to the LLM though, it will always make errors at that exact rate. The kinds of errors it can make are also not just missteps but often pure delusion and very far from what the input was requesting. They cannot reason. They have no rationale. They're imitation in its most empty form. They cannot even so much as provide information reliably.

They also ruin every single industry they come into contact with, and even worse they have utterly destroyed the usability of the internet. LLMs are a net negative for humanity in so many different ways. They deserve as much attention and investment as chatbots did back in 2005.

Their best use case scenario is in churning out an endless amount of lifeless soleless jpg background noise and word salad articles. Their best use case is in tricking people into giving them money or ad revenue. Scamming is the only thing they are anywhere near functionally useful for.

[-] EncryptKeeper@lemmy.world 3 points 22 hours ago* (last edited 22 hours ago)

The problem is that the sales pitch of these AI answering services in search engines is to save you time from having to open search results and read them yourself. The problem with 80-90% accuracy is that if the summaries are hallucinated even once, you can no longer trust them implicitly, so in all cases you now have to verify what it says by opening search results and reading them yourself. It’s a convenience feature that doesn’t offer you any actual convenience.

Sure it’s impressive that they are accurate 80-90% of their time, but AI used in this context is of no actual value.

[-] Patch@feddit.uk 4 points 18 hours ago

It's a real issue. A strong use case for LLM search engines is providing summaries which combine lots of facts that would take some time to compile through searching the old fashioned way. But if it's only 90% accurate and 10% hallucinated bullshit, it becomes very difficult to pick out the bullshit from the truth.

The other day I asked Copilot to provide an overview of a particular industrial sector in my area. It produced something that was 90% concise, accurate, readable and informative briefing, and 10% complete nonsense. It hallucinated an industrial estate that didn't exist, a whole government programme that doesn't exist, it talked about a scheme that went defunct 20 years ago as if it were still current, etc. If it weren't for the fact that I was already very familiar with the subject, I might not have caught it. Anyone actually relying on that for useful work is in serious danger of making a complete tit of themselves.

[-] OhVenus_Baby@lemmy.ml 1 points 17 hours ago

Copilot sucks and I totally understand the POV. I stick with GPT, Mixtral. I don't think their going anywhere anytime soon but they need significant actual refinement.

[-] Flisty@mstdn.social 3 points 18 hours ago

@EncryptKeeper @OhVenus_Baby I have very much embraced the swearing method to get rid of 50% of my Google result screen being taken up by an untrustworthy statement. Just a waste of space and scrolling time.

[-] HubertManne@moist.catsweat.com 1 points 21 hours ago

yeah my thought was like how often is a web search "right". To me ai search is just another version of search. its like at first searching gave you a list of urls, then it gave you a list of urls with human type names as well to make it more clear what it was, then it started giving back little summaries to give an idea of what each page was saying. now it gives you a summary of many pages. My main complaint is these things should be required to giver references and their answers should pretty much look like a wikipedia page but the little drop down carats or roll overs (although I prefer a drop down myself).

[-] wjrii@lemmy.world 5 points 18 hours ago

It never was, but unlike the current batch of LLM assistants that are now dominating the tops of "search" results, it never claimed to be. It was more, "here's what triggered our algorithm as "relevant." Figure out your life, human."

Now, instead, you have a paragraph of natural text that will literally tell you all about cities that don't exist and confidently assert that bestiality is celebrated in Washington DC because someone wrote popular werewolf slash fanfic set in Washington state. Teach the LLMs some fucking equivocation and this problem is immediately reduced, but then it makes it obvious that these things aren't Majel Barrett in Star Trek and they've been pushed out much too quickly.

[-] OhVenus_Baby@lemmy.ml 1 points 17 hours ago

What I have seen like duckAI in duckduckgo search it cites references and if you query GPT and other models with the specific data your looking towards it will cite and give links to where it sourced the input from like PubMed. Etc.

For instance I will query with something like give me a list of flowers that are purple, cite all sources and ensure accuracy of data provided by cross referencing with other studies while using previous chats as context.

I find it's about how you type your queries and logic. Once you understand how the models work rather than blindly accepting them as supreme AI then you understand it's limits and how to utilize the tool for what they are.

[-] HubertManne@moist.catsweat.com 4 points 17 hours ago

I really feel it should not be necessary to ask them to site all sources though. It should be default behavior.

[-] venotic@kbin.melroy.org 9 points 1 day ago

Then again, so has the search engines themselves been proven to be wrong, inaccurate and just plain irrelevant. I've asked questions in Google before about things I need to know in general about my state out of curiosity and it's results always pull up different states that do not apply to mine.

[-] TheFogan@programming.dev 20 points 1 day ago

well that's common, but the big thing is, you can see what you are working with. Big difference in at least knowing you need to try a different site when say

Google: Law about X in state1

Top result: Law about X in state3: It's illegal

Result 2 pages in: here's a list of each page and whether law X is legal in your state... (State 1 legal)

Versus chatgpt

Is X legal in state1?

Chatgpt: No

[-] Hacksaw@lemmy.ca 4 points 13 hours ago

Narrator: it was legal in state 1

[-] catloaf@lemm.ee 12 points 1 day ago

Yeah because you're not supposed to ask search engines questions, you're supposed to use keywords.

[-] barraformat@lemm.ee 4 points 1 day ago

Always ask AI for sources and validate them. You can also request AI to use only certain sources of your liking. Never go blind to those answers.

[-] meyotch@slrpnk.net 3 points 1 day ago

At least I have found something I have in common with AI search engines.

[-] HowRu68@lemmy.world 1 points 1 day ago

Nah, it's just the ghost in the machine.

Tip: always add "True" string to the algorithm/s

this post was submitted on 11 Mar 2025

333 points (100.0% liked)

Technology

66067 readers

4900 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world