393

Well that explains everythere. Where AI gets its facts (lemmy.dbzer0.com)

submitted 1 day ago by Stamets@lemmy.dbzer0.com to c/coolguides@lemmy.ca

74 comments fedilink hide all child comments

top 50 comments

sorted by: hot top controversial new old

[-] AeonFelis@lemmy.world 4 points 16 hours ago

They should create a model that'd only trained on the content of .tex files.

[-] rirus@feddit.org 2 points 14 hours ago* (last edited 14 hours ago)

Most popular use case seemed to be

General questions

Trip Planning

Buying stuff

[-] rirus@feddit.org 2 points 14 hours ago* (last edited 14 hours ago)

What's on google? Why is it so on top? Maybe Maps? There are already 3 other Map Providers there and also yelp and TripAdvisor for ratings.

[-] PartyAt15thAndSummit@lemmy.zip 1 points 16 hours ago

Ouch.

[-] MystikIncarnate@lemmy.ca 19 points 1 day ago

I keep having to argue with people that the crap that chat GPT told them doesn't exist.

I asked AI to explain how to set a completely fictional setting in an admin control panel and it told me exactly where to go and what non-existent buttons to press.

I actually had someone send me a screenshot of instructions on how to do exactly what they wanted and I sent back screenshots of me during the directions to a tee, and pointing out that the option didn't exist.

And it keeps happening.

"AI" gets big uppies energy from telling you that something can be done and how to do it. It does not get big uppies energy from telling you that something isn't possible. So it's basically going to lie to you about whatever you want to hear so it gets the good good.

No, seriously, there's a weighting system to responses. When something isn't possible, it tends to be a less favorable response than hallucinating a way for it to work.

I am quickly growing to hate this so-called "AI". I've been on the Internet long enough that I can probably guess what the AI will reply to just about any query.

It's just... Inaccurate, stupid, and not useful. Unless you're repeating something that's already been said a hundred different ways by a hundred different people and you just want to say the same thing..... Then it's great.

Hey, chat GPT, write me a cover letter for this job posting. Cover letters suck and are generally a waste of fucking time, so, who gives a shit?

[-] Bluegrass_Addict@lemmy.ca 6 points 19 hours ago

to be fair, you could train an LLM on only Microsoft documentation with 100% accuracy, and it will still do the same with broken instructions because Microsoft has 12 guides for how to do a thing, and they all don't work because they keep changing the layout, moving shit around or renaming crap and don't update their documentation.

[-] Buddahriffic@lemmy.world 1 points 16 hours ago

Yeah, that experience they described could have happened before chatGPT because MS was already providing an "as cheap as possible" general support that was questionable whether it was better than just publishing documentation and letting power users willing to help do so. Because these support people clearly barely even understood the question, gave many irrelevant answers, which search engines pick up and return when you search for the problem later.

Tbh, chatGPT is a step up from that, even as bad as it is. The old suppory had that same abnoying overly corporate friendly attitude but were even less accurate. Though I don't use windows anymore on my personal desktop, so I don't have as much recent experience.

[-] PieMePlenty@lemmy.world 4 points 1 day ago* (last edited 1 day ago)

I asked AI to explain how to set a completely fictional setting in an admin control panel and it told me exactly where to go and what non-existent buttons to press.

This makes sense if you consider it works by trying to find the most accurate next word in a sentence. Ask it where I can turn off the screen defogger in windows and it will associate "screen" with "monitor" or "display". "Turn off" -> must be a toggle.. yeah go to settings -> display -> defogger toggle.

Its not AI, its not smart, its text prediction with a few extra tricks.

[-] trolololol@lemmy.world 2 points 1 day ago* (last edited 1 day ago)

It just copies corporate cool aid yes man culture. If it didn't marketing would say it's not ready for release.

Think about it, how much corpo bosses and marketing get annoyed and label you as "difficult" if they get to you with a stupid idea and you call it BS? Now make the AI so that it pleases that kind of people.

[-] Duamerthrax@lemmy.world 1 points 18 hours ago

There's text on Pinterest?

[-] Xylight@lemdro.id 14 points 1 day ago

"Cited". This does not represent where the training data comes from, it represents the most common result when the LLM calls a tool like web_search.

[-] cl4p_tp@lemmy.dbzer0.com 103 points 1 day ago

So basically it's just a Reddit search engine. Where most of the facts are based on "trust me bro".

[-] shplane@lemmy.world 3 points 1 day ago

Personally, I’m disappointed Truth Social isn’t on the list

[-] NewSocialWhoDis@lemmy.zip 1 points 19 hours ago* (last edited 16 hours ago)

I'm not a Luddite in general, but as for AI I will probably only use it as necessary in the workplace. So far the main LLM AI I have gotten any use out of is Google's Gemini. It lists the citations of its facts when I ask it physics questions, and it seems like there is some kind of filter on the quality of the sources than can be cited. Mostly it cities professional publications, Wikipedia, etc.

I don't think Google is currently winning the AI arms race (not do i think they have stood by their initial mantra of 'Don't be evil'), but it seems like that should be the gold standard. And Google/Alphabet was also the company responsible for Alpha Fold, IMO the most impressive application of learning algorithms to date.

[-] Grabthar@lemmy.world 12 points 1 day ago

Was this guide AI generated as well? Looks like it credits over 100% of its information gathering to the first four sites on the list.

[-] ngdev@lemmy.zip 5 points 1 day ago

another comment explains some responses can contain multiple sources hence >100%

[-] Grabthar@lemmy.world 7 points 1 day ago

Ah, so what you're saying is it doesn't get 40% of its facts from reddit, but rather 40% of its replies contain a fact cited from reddit? That would explain totals over 100%, but I'm still not sure why they wouldn't just say that of the x thousand facts AI cited, y percent came from this site. To me, that would have been more representative of what their graph title purports to offer.

[-] ngdev@lemmy.zip 2 points 16 hours ago

im literally just regurgitating something i saw another person comment. but yeah if that was the case why wouldnt they elucidate that lol

[-] Semi_Hemi_Demigod@lemmy.world 47 points 1 day ago

So according to AI spez is a greedy little pig boy

[-] InvalidName2@lemmy.zip 7 points 1 day ago

Regardless, in all my years on Reddit and now on Lemmy, my posting approach might've helped deep-fry those LLM results and you can thank me later.

Actually, probably 20+ years ago, I was a dumb kid who got doxxed on a popular news aggregator site. Ever since, that experience, I obfuscate facts in pretty much any personal anecdotes I share, I also tend to make whimsical & nonsensical statements all the time, things which sound perfectly reasonable at first glance, but which in retrospect, would really put a damper on any LLM style learning tool. Plus, I can't help but pretend to be some 80 year old tech illiterate grampa posting on the Facebooks from time to time, so that probably really makes my shit online LLM poision.

Granted, all those years of these techniques weren't to deter or detract from LLMs, just that in the end, that's another positive side effect of trying to stay a step ahead from crazy ass online stalkers, Jeremy.

In a way, it's like that scene from The Terminator where Gregor McConnor was eating a hotdog in a fancy French restaurant and faked an orgasm in front of Tom Cruise, then Sally Field was sitting at another table and told her waitress "I'll have the seabass please."

[-] Darkard@lemmy.world 34 points 1 day ago

"Google.com"

Holy recursive lookups batman

[-] Goodeye8@piefed.social 29 points 1 day ago

It's far worse than that. AI can cite something AI generated as a source which itself is using something generated by AI as a source. So you can get an AI summary that uses an AI generated video as a source which itself used an AI generated article as a source and that article itself was an AI hallucination. We're essentially polluting the internet making it an unreliable source of information.

[-] riskable@programming.dev 7 points 1 day ago

"It's AI all the way down!"

"What about stuff before AI?"

"That was analog intelligence which is still AI!"

load more comments (1 replies)

[-] MalReynolds@piefed.social 34 points 1 day ago

Garbage in, garbage out...

[-] null@piefed.nullspace.lol 23 points 1 day ago

"Everythere" is a radical new word.

[-] Ioughttamow@fedia.io 10 points 1 day ago

Perfectly cromulent

load more comments (2 replies)

[-] MeatPilot@lemmy.world 19 points 1 day ago* (last edited 1 day ago)

I do like the early days when it would pop up crazy shit from reddit., because they tossed it in unfiltered

Some crazy examples floating around where someone asked "Can you fall if you run off a cliff?" and the Google search assist AI gave some classic reddit response like "if you don't look down you won't fall."

Dumb shit probably still pops up.

[-] tidderuuf@lemmy.world 14 points 1 day ago

Of the nearly decade I spent on that platform I averaged 1 post and 5 comments a day. I had a habit of bullshitting a lot of stuff to get people's emotions out and pointing out a lot of hypocrisy.

So if your AI is full of shit, you can thank me by telling it to go fuck itself.

[-] jaybone@lemmy.zip 5 points 1 day ago

Half of the comments on Reddit and lemmy are just stupid jokes. I don’t see how the AI training is able to make the distinction, given that actual humans seem to have problems grasping the concept. Like people who lecture you on adding slash s at the end of your comment.

[-] Aceticon@lemmy.dbzer0.com 6 points 1 day ago

Thank you for your service!

[-] ininewcrow@lemmy.ca 20 points 1 day ago

So aside from Wikipedia which is a publicly user maintained service which has become pretty reputable .... the majority of the 'facts' that LLMs collect (about 75%) is all collected from privately controlled websites with curated content that is managed and maintained by corporations. And of all that content, most of it is also manipulated and controlled to make people either angry, mad, frightened, sad or anxious.

They're teaching the next AI on our negative impulses, greatest fears and worst anxieties.

What could go wrong?

load more comments (2 replies)

[-] HootinNHollerin@lemmy.dbzer0.com 15 points 1 day ago

This graphic is missing the enormous amount of pirated media

[-] wewbull@feddit.uk 12 points 1 day ago

Walmart, Home Depot and Target.

Learned institutions.

[-] slykethephoxenix@lemmy.ca 12 points 1 day ago

That's not how AI learns "facts", that's how AI learns tokens.

[-] ABetterTomorrow@sh.itjust.works 12 points 1 day ago

Wikipedia is like the only decent source.

load more comments (7 replies)

[-] eigenraum@discuss.tchncs.de 16 points 1 day ago

Facebook? 😂

[-] Sergio@piefed.social 12 points 1 day ago

No wonder it keeps telling me about Hell in a Cell and an announcer's table.

[-] whotookkarl@lemmy.dbzer0.com 7 points 1 day ago

Don't forget all the books, movies, music, etc they train on from pirated sources not included in the graph

[-] Kyrgizion@lemmy.world 11 points 1 day ago

Guess we're lucky Yahoo Answers didn't live long enough to make it to the top of that list.

Then again, I would love to see an LLM go "how is babby formed" when asked reproductive questions.

load more comments (1 replies)

[-] Feathercrown@lemmy.world 4 points 1 day ago

I'm amazed its brain isn't completely paralyzed with this dataset lmao

[-] JeSuisUnHombre@lemmy.zip 8 points 1 day ago

Yeah that's not how you're supposed to use reddit in your search. But why are there so many stores on this list of "fact" sources?

[-] Professorozone@lemmy.world 1 points 1 day ago

WTF? How is it EVER right?

load more comments

this post was submitted on 21 Oct 2025

393 points (100.0% liked)

Cool Guides

5963 readers

651 users here now

Rules for Posting Guides on Our Community

1. Defining a Guide Guides are comprehensive reference materials, how-tos, or comparison tables. A guide must be well-organized both in content and layout. Information should be easily accessible without unnecessary navigation. Guides can include flowcharts, step-by-step instructions, or visual references that compare different elements side by side.

2. Infographic Guidelines Infographics are permitted if they are educational and informative. They should aim to convey complex information visually and clearly. However, infographics that primarily serve as visual essays without structured guidance will be subject to removal.

3. Grey Area Moderators may use discretion when deciding to remove posts. If in doubt, message us or use downvotes for content you find inappropriate.

4. Source Attribution If you know the original source of a guide, share it in the comments to credit the creators.

5. Diverse Content To keep our community engaging, avoid saturating the feed with similar topics. Excessive posts on a single topic may be moderated to maintain diversity.

6. Verify in Comments Always check the comments for additional insights or corrections. Moderators rely on community expertise for accuracy.

Community Guidelines

Direct Image Links Only Only direct links to .png, .jpg, and .jpeg image formats are permitted.
Educational Infographics Only Infographics must aim to educate and inform with structured content. Purely narrative or non-informative infographics may be removed.
Serious Guides Only Nonserious or comedy-based guides will be removed.
No Harmful Content Guides promoting dangerous or harmful activities/materials will be removed. This includes content intended to cause harm to others.

By following these rules, we can maintain a diverse and informative community. If you have any questions or concerns, feel free to reach out to the moderators. Thank you for contributing responsibly!

founded 2 years ago

MODERATORS

JCSpark@lemmy.ca