590

Google is cannibalizing the web to feed AI (www.theregister.com)

submitted 3 weeks ago by throws_lemy@reddthat.com to c/technology@lemmy.world

125 comments fedilink hide all child comments

top 50 comments

sorted by: hot top controversial new old

[-] BrightCandle@lemmy.world 188 points 3 weeks ago

The death of Stackoverflow is one of these events where the site has been completely killed by AI and yet its contents is completely necessary for AI to know about solving programming problems. Its death will mark the end of AIs ability to learn how to solve programming issues. Its cannibalizing itself in the process, as it destroys its sources it destroys its own ability to learn.

[-] artyom@piefed.social 98 points 3 weeks ago* (last edited 3 weeks ago)

It's not just that, it's shitting where it eats. People are using it to fill the internet with disinformation, then it trains itself on it's own disinformation, and breeds even worse disinformation. This is why AI can never be smarter than it was in 2021.

On top of that, due to the indiscriminate DDOSing of the entire internet by AI bots, websites have been blocking any web crawlers that are not Google, which just contributes to their monopoly.

[-] Zarxrax@lemmy.world 15 points 3 weeks ago

I'm pretty sure AI is objectively smarter today than it was 5 years ago.

[-] SpaceNoodle@lemmy.world 38 points 3 weeks ago

Since LLMs literally can't learn, no. They're just increasingly tweaked to seem even more convincing.

[-] Sharkticon@lemmy.zip 17 points 3 weeks ago

How can something with no intelligence be smarter?

[-] SystemDisc@feddit.org 5 points 3 weeks ago* (last edited 5 days ago)

This is true, depending on what you mean by smarter. They are undeniably more capable. However, the trendy, cool thing is to hate on AI, rejecting all else. Sure, capitalism sucks, and the powerful rich people and companies who control AI suck. AI itself, though, can very easily result in massive benefits for humanity as a whole.

load more comments (19 replies)

[-] oce@jlai.lu 4 points 3 weeks ago* (last edited 3 weeks ago)

There's better integration with all sorts of other sources of truth beyond the LLM training, which makes it seem smarter.

load more comments (4 replies)

[-] chunes@lemmy.world 4 points 3 weeks ago

Model collapse isn't a thing anymore. https://arxiv.org/html/2510.16657v1

[-] Grandwolf319@sh.itjust.works 29 points 3 weeks ago

Our key finding is that by injecting information through an external synthetic data verifier, whether a human or a better model, synthetic retraining will not cause model collapse.

Yeah if you have a source of truth then your model is basically getting trained on that.

It’s like already having the answer

load more comments (2 replies)

[-] CmdrShepard49@sh.itjust.works 22 points 3 weeks ago* (last edited 3 weeks ago)

Our key finding is that by injecting information through an external synthetic data verifier, whether a human or a better model, synthetic retraining will not cause model collapse.

Lol, so to make a great model, they just need to have an even better one available first or a human who can verify every single thing it ingests.

Hmm, call me skeptical on this claim.

[-] artyom@piefed.social 6 points 3 weeks ago

LOL OK

load more comments (7 replies)

load more comments (2 replies)

[-] CapuccinoCoretto@lemmy.world 87 points 3 weeks ago

Friends don't let friends use Google.

[-] Rhaedas@fedia.io 29 points 3 weeks ago

I can't remember the name, but when the internet was just starting and there were a lot of search engines with no dominate ones, there was an aggregator program that you could input many search engines into, then use it as the searching tool. It would query all the engines and combine, sort, rank, and remove duplicate finds.

Edit: more specific - It was much like an FTP or torrent program but you'd load up what search engines to use and your search words, and it would actively pull the info then provide a single page with all results.

The reason I mention it is because we're sort of back at that point. Google is failing, Bing never was great, and all the alternatives have their issues, usually with not having the same database to work with. So if you gathered all the best ones, the ones without ties to corporate or AI, then put their results together, maybe you'd have something like what Google was at its peak before "do no evil" got painted over.

Incidentally, Google became what it was/is because it gobbled up a lot of those early search engines' databases. I miss you, Hotbot. You were a good one.

[-] PlantJam@lemmy.world 45 points 3 weeks ago

Search used to be so good. I had an old Honda civic that suddenly wouldn't start. It wasn't the starter, alternator, or battery. I managed to find a forum post with my exact issue, which was that a small rubber piece on the clutch pressed a button to "tell" the starter it was okay to start. Twenty minutes later I had zip tied a piece of plastic into place and had a working car again.

If I tried to diagnose that same issue today, it'd be dozens of SEO garbage slop sites without any actual useful information.

[-] unglueclass23@programming.dev 14 points 3 weeks ago* (last edited 3 weeks ago)

I was thinking the same thing recently. It's not the place it once was. But in general the internet has changed a lot. And it's not just AI.

All sorts of paywalls especially in news sites.
Everything is getting centralized into a few sites and they're usually eithe poorly indexable or not at all (Discord, facebook, X, Instagram and so on)
Fediverse (Lemmy, Mastodon) also struggles with search engines.
People trying to sell you shit, create a brand even more than before. Because of this all sorts of SEO optimization crap is done like writing BS articles nobody cares about.
AI slop.
Search engines have gotten better of getting rid of "illegal stuff".
A lot of sites are just presentational bloat with no substance. Very cool looking landing pages with all sorts of cool animations but when you need to actually find the information that you need... the same UI usually gets in the way.

Oh and now we're getting into age verification crap also yay

[-] PlantJam@lemmy.world 7 points 3 weeks ago

An example of number 4, there's a poster I've seen on reddit that's posting very relevant content, but then every post ends with "@xxxxxxxx on all socials". It just takes the whole thing from content I might want to engage with to the exact opposite.

[-] bluegreenpurplepink@lemmy.world 5 points 3 weeks ago

They are literally walling off all this information that used to be easy to access and for the public. It’s our data that we the people decide to share with the world and these rent seeing corporations are hiding it away so they can start charging us "tokens" to access our own public information.

[-] SillyDude@lemmy.zip 4 points 3 weeks ago

I asked gpt5 and it told me to check the clutch safety switch. The thing you fixed.

[-] skulblaka@sh.itjust.works 17 points 3 weeks ago

5-10 years ago, you could be pretty sure this was a thing that actually needed checked, since the post about the clutch safety switch was posted by a real person who presumably had the same problem as you and fixed it with this method.

Now, there's no way to know if that's actually the case, or if "clutch safety switch" is just a likely string of words to feed someone who is having car trouble. You might get lucky, or you might get sent on eight consecutive goose chases because an LLM fundamentally doesn't know what factual knowledge is, it only knows how to reorder and regurgitate things that other people have said in other contexts.

load more comments (3 replies)

load more comments (4 replies)

load more comments (5 replies)

[-] Flagstaff@programming.dev 14 points 3 weeks ago

DuckDuckGo and Ecosia?

[-] FarraigePlaisteach@lemmy.world 14 points 3 weeks ago

Ecosia have planted 250,000+ trees so far and publish their accounts every month. I can’t think of a better option, unless there is a niche requirement.

[-] RiverRabbits 14 points 3 weeks ago

they burn all their efforts by pushing genAI tech on their platform

[-] FarraigePlaisteach@lemmy.world 8 points 3 weeks ago

Isn’t that hyperbole rather than truth? They’re still carbon negative.

They don’t provide AI by default (at least, I don’t get it). So people like us can continue to not use AI and the hundreds of million who use it every day can still support tree planting.

I don’t like AI, but if they don’t add it they could risk limiting their reach and environmental goals.

[-] RiverRabbits 9 points 3 weeks ago

AI companies do not release any numbers themselves for carbon emissions. Therefore, companies that use AI cannot in any certainty claim to be carbon negative or neutral, because they have to count the supply chain emissions as well.

not adding AI does not stifle environmental goals, in fact you can only truthfully claim to strive for carbon goals if you do not use AI. After all, there is a reason that Microsoft abandoned their emission goals with AI as the cited reason first and foremost, which shows how incredibly dirty AI can be, even if no one releases any sensible metrics.

load more comments (2 replies)

[-] squirrel@cake.kobel.fyi 8 points 3 weeks ago* (last edited 3 weeks ago)

Go to https://ecosia.org/ in a private browser window. It says "AI that answers to the planet". Search something and the AI Overview on top is enabled by default.

I use them with AI disabled, but it should be the default setting.

Edit: I just did a couple test searches and didn't get the AI overview. Don't know what triggers it.

load more comments (3 replies)

[-] Yliaster@lemmy.world 14 points 3 weeks ago

Startpage and Mojeek.

DDG has contracts w Microslop.

[-] Mwa@thelemmy.club 9 points 3 weeks ago* (last edited 3 weeks ago)

Startpage uses google btw, Mojeek is decent. (I like Mojeek backend with SearXNG.)

[-] Yliaster@lemmy.world 4 points 3 weeks ago

That's disappointing. But Mojeek is kind of unusable tbh.

load more comments (1 replies)

load more comments (3 replies)

load more comments (1 replies)

[-] uriel238 34 points 3 weeks ago

More Perfect Union did a video on Google's descent into evil. I think it's this one

TLDW: Once Google pivoted from being a search service to an advertising agency, it was motivated to keep users from hyperlinking away from Google, and so offered summaries and alternatives controlled by Alphabet that allowed it to keep offering you ads.

So this AI service is just a natural iteration.

[-] schwim@piefed.zip 23 points 3 weeks ago

It's the same arc every monopolistic corporation has taken before it, AI is just accelerating the pace of consuming your customer/product because profits must always increase.

There will be no large scale shift from these experiences because most people are either ok, apathetic or blissfully ignorant to the situation, the best you can do is to remove yourself from the exploitation of the userbase. Linux instead of Windows or Android, Almost any search engine other than Google, fediverse instead of reddit, etc.

[-] Tollana1234567@lemmy.today 23 points 3 weeks ago* (last edited 3 weeks ago)

he means reddit mostly. AI SLOP GENERATOR, TRAINING ON SLOP like reddit. with a little of plagiarizing from authors, and artists.

[-] thermal_shock@lemmy.world 14 points 3 weeks ago

I called to schedule a play date at my local dog daycare/boarding, it went to an AI answering service. I asked if it was AI since I could hear noice in the background (literal fake background chatter and noise), when she said yes, hung up. SO tired of AI everywhere. Fuck it all.

[-] very_well_lost@lemmy.world 10 points 3 weeks ago

lmao, I bet they trained the stupid thing on recordings from massive call centers. It probably thinks that all the 'background' noise is just part of how humans communicate.

[-] tmyakal@infosec.pub 7 points 3 weeks ago

it probably thinks

It doesn't think. People need to stop anthropomorphizing the statistical probability machine.

load more comments (2 replies)

[-] BilSabab@lemmy.world 5 points 3 weeks ago

ah yes, reddit, the most well-mannered and measured of social media platforms that never indulges itself in spreading hate in misinformation.

[-] MajorasTerribleFate@lemmy.zip 4 points 3 weeks ago

It is normal to glue pepperoni to your pizza to keep it from sliding off when baking.

load more comments (1 replies)

load more comments (2 replies)

[-] borth@sh.itjust.works 17 points 3 weeks ago

I don't understand how these companies want to seem and think they are so smart by choosing new niche data (scraped) to train AI in a bid to try and make it "smart"....

Has any other living being become "smart" by only ingesting information directly from the Internet? You can train other animals to perform many tasks and can probably say they are smart when they perform them as expected. I doubt any of the training methods is to tape headphones, a screen and sometimes a microphone to their faces forever (I kinda don't wanna know if this false 😶).

The best example we have, is ourselves, and even though we use the Internet, babies are not taught how to walk and talk by only interacting with the Internet.

I feel like I might be saying too much, but I think the best AI we're gonna get is to unplug it from the Internet, and then fucking raise it for 20 years like a normal, super fast-thinking child prodigy. Then just make copies of that and train further by having it go to school for the things needed.

[-] NikkiDimes@lemmy.world 5 points 3 weeks ago

That's a very naive simplification of the AI training process. You start with that, then pay people pennies in a developing nation to produce hand crafted training data, resulting it using stupid words like delve and whimsical entirely too much.

Merely training on internet content with no RLFH training results in probable gibberish like that of GPT-2

load more comments (4 replies)

[-] Iusedtobeanalien@lemmy.world 13 points 3 weeks ago

It's a self defeating strategy as more people turn to ai, less content gets produced so ai becomes static.

I truly believe the token model will kill AI, it will become too expensive

[-] iocase@lemmy.zip 16 points 3 weeks ago

It already is too expensive and adding more compute doesn't make it cheaper lol it just causes a race to the bottom among data center providers and an eventual crash there too.

[-] DylanMc6@lemmy.dbzer0.com 9 points 3 weeks ago

We more less vibe-coding and more coders with thigh-high striped socks.

load more comments (4 replies)

[-] DJKJuicy@sh.itjust.works 8 points 3 weeks ago

[-] Taleya@aussie.zone 8 points 3 weeks ago

Google has been cannibalising the net for a decade or so now

[-] Impractical_Island@lemmy.world 7 points 3 weeks ago

Only a minute away until Google starts a Soylent Green subsidiary company

[-] Evotech@lemmy.world 4 points 3 weeks ago

Honestly I think Google is pretty fucked in the long term

Nobody google anymore. They just ask chat

load more comments (1 replies)

load more comments

this post was submitted on 25 May 2026

590 points (100.0% liked)

Technology

85521 readers

985 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 3 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws