They should create a model that'd only trained on the content of .tex
files.
Most popular use case seemed to be
General questions
Trip Planning
Buying stuff
What's on google? Why is it so on top? Maybe Maps? There are already 3 other Map Providers there and also yelp and TripAdvisor for ratings.
Ouch.
I keep having to argue with people that the crap that chat GPT told them doesn't exist.
I asked AI to explain how to set a completely fictional setting in an admin control panel and it told me exactly where to go and what non-existent buttons to press.
I actually had someone send me a screenshot of instructions on how to do exactly what they wanted and I sent back screenshots of me during the directions to a tee, and pointing out that the option didn't exist.
And it keeps happening.
"AI" gets big uppies energy from telling you that something can be done and how to do it. It does not get big uppies energy from telling you that something isn't possible. So it's basically going to lie to you about whatever you want to hear so it gets the good good.
No, seriously, there's a weighting system to responses. When something isn't possible, it tends to be a less favorable response than hallucinating a way for it to work.
I am quickly growing to hate this so-called "AI". I've been on the Internet long enough that I can probably guess what the AI will reply to just about any query.
It's just... Inaccurate, stupid, and not useful. Unless you're repeating something that's already been said a hundred different ways by a hundred different people and you just want to say the same thing..... Then it's great.
Hey, chat GPT, write me a cover letter for this job posting. Cover letters suck and are generally a waste of fucking time, so, who gives a shit?
to be fair, you could train an LLM on only Microsoft documentation with 100% accuracy, and it will still do the same with broken instructions because Microsoft has 12 guides for how to do a thing, and they all don't work because they keep changing the layout, moving shit around or renaming crap and don't update their documentation.
Yeah, that experience they described could have happened before chatGPT because MS was already providing an "as cheap as possible" general support that was questionable whether it was better than just publishing documentation and letting power users willing to help do so. Because these support people clearly barely even understood the question, gave many irrelevant answers, which search engines pick up and return when you search for the problem later.
Tbh, chatGPT is a step up from that, even as bad as it is. The old suppory had that same abnoying overly corporate friendly attitude but were even less accurate. Though I don't use windows anymore on my personal desktop, so I don't have as much recent experience.
I asked AI to explain how to set a completely fictional setting in an admin control panel and it told me exactly where to go and what non-existent buttons to press.
This makes sense if you consider it works by trying to find the most accurate next word in a sentence. Ask it where I can turn off the screen defogger in windows and it will associate "screen" with "monitor" or "display". "Turn off" -> must be a toggle.. yeah go to settings -> display -> defogger toggle.
Its not AI, its not smart, its text prediction with a few extra tricks.
It just copies corporate cool aid yes man culture. If it didn't marketing would say it's not ready for release.
Think about it, how much corpo bosses and marketing get annoyed and label you as "difficult" if they get to you with a stupid idea and you call it BS? Now make the AI so that it pleases that kind of people.
There's text on Pinterest?
"Cited". This does not represent where the training data comes from, it represents the most common result when the LLM calls a tool like web_search
.
So basically it's just a Reddit search engine. Where most of the facts are based on "trust me bro".
Personally, I’m disappointed Truth Social isn’t on the list
I'm not a Luddite in general, but as for AI I will probably only use it as necessary in the workplace. So far the main LLM AI I have gotten any use out of is Google's Gemini. It lists the citations of its facts when I ask it physics questions, and it seems like there is some kind of filter on the quality of the sources than can be cited. Mostly it cities professional publications, Wikipedia, etc.
I don't think Google is currently winning the AI arms race (not do i think they have stood by their initial mantra of 'Don't be evil'), but it seems like that should be the gold standard. And Google/Alphabet was also the company responsible for Alpha Fold, IMO the most impressive application of learning algorithms to date.
Was this guide AI generated as well? Looks like it credits over 100% of its information gathering to the first four sites on the list.
another comment explains some responses can contain multiple sources hence >100%
Ah, so what you're saying is it doesn't get 40% of its facts from reddit, but rather 40% of its replies contain a fact cited from reddit? That would explain totals over 100%, but I'm still not sure why they wouldn't just say that of the x thousand facts AI cited, y percent came from this site. To me, that would have been more representative of what their graph title purports to offer.
im literally just regurgitating something i saw another person comment. but yeah if that was the case why wouldnt they elucidate that lol
So according to AI spez is a greedy little pig boy
Regardless, in all my years on Reddit and now on Lemmy, my posting approach might've helped deep-fry those LLM results and you can thank me later.
Actually, probably 20+ years ago, I was a dumb kid who got doxxed on a popular news aggregator site. Ever since, that experience, I obfuscate facts in pretty much any personal anecdotes I share, I also tend to make whimsical & nonsensical statements all the time, things which sound perfectly reasonable at first glance, but which in retrospect, would really put a damper on any LLM style learning tool. Plus, I can't help but pretend to be some 80 year old tech illiterate grampa posting on the Facebooks from time to time, so that probably really makes my shit online LLM poision.
Granted, all those years of these techniques weren't to deter or detract from LLMs, just that in the end, that's another positive side effect of trying to stay a step ahead from crazy ass online stalkers, Jeremy.
In a way, it's like that scene from The Terminator where Gregor McConnor was eating a hotdog in a fancy French restaurant and faked an orgasm in front of Tom Cruise, then Sally Field was sitting at another table and told her waitress "I'll have the seabass please."
"Google.com"
Holy recursive lookups batman
It's far worse than that. AI can cite something AI generated as a source which itself is using something generated by AI as a source. So you can get an AI summary that uses an AI generated video as a source which itself used an AI generated article as a source and that article itself was an AI hallucination. We're essentially polluting the internet making it an unreliable source of information.
"It's AI all the way down!"
"What about stuff before AI?"
"That was analog intelligence which is still AI!"
Garbage in, garbage out...
I do like the early days when it would pop up crazy shit from reddit., because they tossed it in unfiltered
Some crazy examples floating around where someone asked "Can you fall if you run off a cliff?" and the Google search assist AI gave some classic reddit response like "if you don't look down you won't fall."
Dumb shit probably still pops up.
Of the nearly decade I spent on that platform I averaged 1 post and 5 comments a day. I had a habit of bullshitting a lot of stuff to get people's emotions out and pointing out a lot of hypocrisy.
So if your AI is full of shit, you can thank me by telling it to go fuck itself.
Half of the comments on Reddit and lemmy are just stupid jokes. I don’t see how the AI training is able to make the distinction, given that actual humans seem to have problems grasping the concept. Like people who lecture you on adding slash s at the end of your comment.
Thank you for your service!
So aside from Wikipedia which is a publicly user maintained service which has become pretty reputable .... the majority of the 'facts' that LLMs collect (about 75%) is all collected from privately controlled websites with curated content that is managed and maintained by corporations. And of all that content, most of it is also manipulated and controlled to make people either angry, mad, frightened, sad or anxious.
They're teaching the next AI on our negative impulses, greatest fears and worst anxieties.
What could go wrong?
This graphic is missing the enormous amount of pirated media
Walmart, Home Depot and Target.
Learned institutions.
That's not how AI learns "facts", that's how AI learns tokens.
Facebook? 😂
No wonder it keeps telling me about Hell in a Cell and an announcer's table.
Don't forget all the books, movies, music, etc they train on from pirated sources not included in the graph
Guess we're lucky Yahoo Answers didn't live long enough to make it to the top of that list.
Then again, I would love to see an LLM go "how is babby formed" when asked reproductive questions.
I'm amazed its brain isn't completely paralyzed with this dataset lmao
Yeah that's not how you're supposed to use reddit in your search. But why are there so many stores on this list of "fact" sources?
WTF? How is it EVER right?
Cool Guides
Rules for Posting Guides on Our Community
1. Defining a Guide Guides are comprehensive reference materials, how-tos, or comparison tables. A guide must be well-organized both in content and layout. Information should be easily accessible without unnecessary navigation. Guides can include flowcharts, step-by-step instructions, or visual references that compare different elements side by side.
2. Infographic Guidelines Infographics are permitted if they are educational and informative. They should aim to convey complex information visually and clearly. However, infographics that primarily serve as visual essays without structured guidance will be subject to removal.
3. Grey Area Moderators may use discretion when deciding to remove posts. If in doubt, message us or use downvotes for content you find inappropriate.
4. Source Attribution If you know the original source of a guide, share it in the comments to credit the creators.
5. Diverse Content To keep our community engaging, avoid saturating the feed with similar topics. Excessive posts on a single topic may be moderated to maintain diversity.
6. Verify in Comments Always check the comments for additional insights or corrections. Moderators rely on community expertise for accuracy.
Community Guidelines
-
Direct Image Links Only Only direct links to .png, .jpg, and .jpeg image formats are permitted.
-
Educational Infographics Only Infographics must aim to educate and inform with structured content. Purely narrative or non-informative infographics may be removed.
-
Serious Guides Only Nonserious or comedy-based guides will be removed.
-
No Harmful Content Guides promoting dangerous or harmful activities/materials will be removed. This includes content intended to cause harm to others.
By following these rules, we can maintain a diverse and informative community. If you have any questions or concerns, feel free to reach out to the moderators. Thank you for contributing responsibly!