686

Codeberg: army of AI crawlers are extremely slowing us; AI crawlers learned how to solve the Anubis challenges. (i.imgur.com)

submitted 1 week ago* (last edited 6 days ago) by Pro@programming.dev to c/Technology@programming.dev

91 comments fedilink hide all child comments

Comments

Lemmy;
Hackernews.

Source.

you are viewing a single comment's thread
view the rest of the comments

[-] MonkderVierte@lemmy.zip 18 points 6 days ago* (last edited 6 days ago)

I just thought that having a client side proof-of-work (or even only a delay) bound to the IP might deter the AI companies to choose to behave instead (because single-visit-per-IP crawlers get too expensive/slow and you can just block normal abusive crawlers). But they already have mind-blowing computing and money ressources and only want your data.

But if there was a simple-to-use integrated solution and every single webpage used this approach?

[-] witten@lemmy.world 12 points 6 days ago

Believe me, these AI corporations have way too many IPs to make this feasible. I've tried per-IP rate limiting. It doesn't work on these crawlers.

[-] daniskarma@lemmy.dbzer0.com 2 points 6 days ago

Solution was invented long ago. It's called a captcha.

A little bother for legitimate users, but a good captcha is still hard to bypass even using AI.

And I think for the final user standpoint I prefer to lose 5 seconds in a captcha, than the browser running an unsolicited heavy crypto challenge on my end.

[-] Kissaki@feddit.org 9 points 5 days ago

For years, we’ve written that CAPTCHAs drive us crazy. Humans give up on CAPTCHA puzzles approximately 15% of the time and, maddeningly, CAPTCHAs are significantly easier for bots to solve than they are for humans.

https://blog.cloudflare.com/turnstile-ga/

I hate captchas.

[-] MonkderVierte@lemmy.zip 7 points 6 days ago

AI is better at solving captchas than you.

[-] daniskarma@lemmy.dbzer0.com 2 points 5 days ago* (last edited 5 days ago)

I tried, and not really.

I had to scrape a site that have some captcha and no AI was able to consistently solve it.

In order to be able to "crack it" I had to replicate the captcha generation algorithm best I could and train a custom model to solve it. Only then I could crack it open. And I was lucky the captcha generation algorithm wasn't to complex and it was easy to replicate.

This amount of work is a far greater load than Anubis crypto challenges.

Take into account that AI drive ocr drinks from existing examples, if your captcha is novel enough they are going to have a hard time solving it.

It also would drain power, which is the only point of anubis.

[-] mholiv@lemmy.world 1 points 5 days ago

There is a difference between you (or me) sitting at home working on this and a team of highly motivated people with unlimited money.

[-] daniskarma@lemmy.dbzer0.com 1 points 5 days ago* (last edited 5 days ago)

The thing is not that it cannot be done, the thing is that the cost is most likely higher than Anubis.

[-] Taldan@lemmy.world 2 points 6 days ago

Are you planning to just outright ban IPv6 (and thus half the world)?

Any IP based restriction is useless with IPv6

[-] strict0768@lemmy.world 4 points 6 days ago

Not really true, you can block ranges.

[-] Taldan@lemmy.world 1 points 4 days ago

Okay, but how does that help? Or are you suggesting just wholesale banning entire ISPs?

[-] explodicle@sh.itjust.works 2 points 6 days ago

What if we had some protocol by which the proof-of-work is transferable? Then not only would there be a cost to using the website, but also the operator would receive that cost as payment.

[-] Taldan@lemmy.world 4 points 6 days ago* (last edited 6 days ago)

It's theoretically viable, but every time that has been tried has failed

There are a lot of practical issues, mainly that it's functionally identical to a crypto miner malware

this post was submitted on 17 Aug 2025

686 points (100.0% liked)

Technology

426 readers

269 users here now

Share interesting Technology news and links.

Rules:

No paywalled sites at all.
News articles has to be recent, not older than 2 weeks (14 days).
No videos.
Post only direct links.

To encourage more original sources and keep this space commercial free as much as I could, the following websites are Blacklisted:

Al Jazeera;
NBC;
CNBC;
Substack;
Tom's Hardware;
ZDNet;
TechSpot;
Ars Technica;
Vox Media outlets, with exception for Axios;
Engadget;
TechCrunch;
Gizmodo;
Futurism;
PCWorld;
ComputerWorld;
Mashable;
Hackaday;
WCCFTECH;
Neowin.

More sites will be added to the blacklist as needed.

Encouraged:

Archive links in the body of the post.
Linking to the direct source, instead of linking to an article talking about the source.

founded 3 months ago

MODERATORS

Pro@programming.dev