241

Some thoughts on how useful Anubis really is. Combined with comments I read elsewhere about scrapers starting to solve the challenges, I'm afraid Anubis will be outdated soon and we need something else.

top 50 comments
sorted by: hot top controversial new old
[-] rtxn@lemmy.world 200 points 2 months ago* (last edited 2 months ago)

The current version of Anubis was made as a quick "good enough" solution to an emergency. The article is very enthusiastic about explaining why it shouldn't work, but completely glosses over the fact that it has worked, at least to an extent where deploying it and maybe inconveniencing some users is preferable to having the entire web server choked out by a flood of indiscriminate scraper requests.

The purpose is to reduce the flood to a manageable level, not to block every single scraper request.

[-] poVoq@slrpnk.net 95 points 2 months ago* (last edited 2 months ago)

And it was/is for sure the lesser evil compared to what most others did: put the site behind Cloudflare.

I feel people that complain about Anubis have never had their server overheat and shut down on an almost daily basis because of AI scrapers 🤦

[-] tofu@lemmy.nocturnal.garden 18 points 2 months ago* (last edited 2 months ago)

Yeah, I'm just wondering what's going to follow. I just hope everything isn't going to need to go behind an authwall.

[-] rtxn@lemmy.world 38 points 2 months ago
[-] grysbok@lemmy.sdf.org 22 points 2 months ago

I'll say the developer is also very responsive. They're (ambiguous 'they', not sure of pronouns) active in a libraries-fighting-bots slack channel I'm on. Libraries have been hit hard by the bots: we have hoards of tasty archives and we don't have money to throw resources at the problem.

[-] lilith267 10 points 2 months ago

The Anubis repo has an enbyware emblem fun fact :D

load more comments (1 replies)
[-] tofu@lemmy.nocturnal.garden 6 points 2 months ago

Cool, thanks for posting! Also the reasoning for the image is cool.

load more comments (16 replies)
[-] AnUnusualRelic@lemmy.world 21 points 2 months ago

The problem is that the purpose of Anubis was to make crawling more computationally expensive and that crawlers are apparently increasingly prepared to accept that additional cost. One option would be to pile some required cycles on top of what's currently asked, but it's a balancing act before it starts to really be an annoyance for the meat popsicle users.

[-] rtxn@lemmy.world 21 points 2 months ago

That's why the developer is working on a better detection mechanism. https://xeiaso.net/blog/2025/avoiding-becoming-peg-dependency/

[-] 0_o7@lemmy.dbzer0.com 20 points 2 months ago

The article is very enthusiastic about explaining why it shouldn't work, but completely glosses over the fact that it has worked

This post was originally written for ycombinator "Hacker" News which is vehemently against people hacking things together for greater good, and more importantly for free.

It's more of a corporate PR release site and if you aren't known by the "community", calling out solutions they can't profit off of brings all the tech-bros to the yard for engagement.

load more comments (1 replies)
[-] unexposedhazard@discuss.tchncs.de 70 points 2 months ago

This… makes no sense to me. Almost by definition, an AI vendor will have a datacenter full of compute capacity.

Well it doesnt fucking matter what "makes sense to you" because it is working...
Its being deployed by people who had their sites DDoS'd to shit by crawlers and they are very happy with the results so what even is the point of trying to argue here?

[-] daniskarma@lemmy.dbzer0.com 13 points 2 months ago* (last edited 2 months ago)

It's working because it's not very used. It's sort of a "pirate seagull" theory. As long a few people use it it works. Because scrappers don't really count on Anubis so they don't implement systems to surpass it.

If it were to become more common it would be really easy to implement systems that would defeat the purpose.

As of right now sites are ok because scrappers just send https requests and expect a full response. If someone wants to bypass Anubis protection they would need to take into account that they will receive a cryptographic challenge and have to solve it.

The thing is that cryptographic challenges can be very optimized. They are designed to run in a very inefficient environment as it is a browser. But if someone would take the challenge and solve it in a better environment using CUDA or something like that it would take a fraction of the energy defeating the purpose of "being so costly that it's not worth scrapping".

At this point it's only a matter of time that we start seeing scrappers like that. Specially if more and more sites start using Anubis.

[-] rtxn@lemmy.world 38 points 2 months ago

New developments: just a few hours before I post this comment, The Register posted an article about AI crawler traffic. https://www.theregister.com/2025/08/21/ai_crawler_traffic/

Anubis' developer was interviewed and they posted the responses on their website: https://xeiaso.net/notes/2025/el-reg-responses/

In particular:

Fastly's claims that 80% of bot traffic is now AI crawlers

In some cases for open source projects, we've seen upwards of 95% of traffic being AI crawlers. For one, deploying Anubis almost instantly caused server load to crater by so much that it made them think they accidentally took their site offline. One of my customers had their power bills drop by a significant fraction after deploying Anubis. It's nuts.

So, yeah. If we believe Xe, OOP's article is complete hogwash.

[-] tofu@lemmy.nocturnal.garden 9 points 2 months ago

Cool article, thanks for linking! Not sure about that being a new development though, it's just results, but we already knew it's working. The question is, what's going to work once the scrapers adapt?

[-] Klear@quokk.au 26 points 2 months ago* (last edited 2 months ago)

If that sounds familiar, it’s because it’s similar to how bitcoin mining works. Anubis is not literally mining cryptocurrency, but it is similar in concept to other projects that do exactly that

Did the author only now discover cryptography? It's like a cryptocurrency, just without currency, what a concept!

[-] SkaveRat@discuss.tchncs.de 11 points 2 months ago

It's a perfectly valid way to explain it, though

If you try to show up with "cryptography" as an explanation, people will think of encrypting messages, not proof of work

"Cryptocurrency with the currency" really is the perfect single sentence explanation

load more comments (1 replies)
[-] Dremor@lemmy.world 23 points 2 months ago

Anubis is no challenge like a captcha. Anubis is a ressource waster, forcing crawler to resolve a crypto challenge (basically like mining bitcoin) before being allowed in. That how it defends so well against bots, as they do not want to waste their resources on needless computing, they just cancel the page loading before it even happen, and go crawl elsewhere.

[-] tofu@lemmy.nocturnal.garden 11 points 2 months ago

No, it works because the scraper bots don't have it implemented yet. Of course the companies would rather not spend additional compute resources, but their pockets are deep and some already adapted and solve the challenges.

[-] Dremor@lemmy.world 13 points 2 months ago

To solve it or not do not change that they have to use more resources for crawling, which is the objective here. And by contrast, the website sees a lot less load compared to before the use of Anubis. In any case, I see it as a win.

But despite that, it has its detractors, like any solution that becomes popular.

But let's be honest, what are the arguments against it?
It takes a bit longer to access for the first time? Sure, but that's not like you have to click anything or write anything.
It executes foreign code on your machine? Literally 90% of the web does these days. Just disable JavaScript to see how many website is still functional. I'd be surprised if even a handful does.

The only people having any advantages at not having Anubis are web crawler, be it ai bots, indexing bots, or script kiddies trying to find a vulnerable target.

load more comments (5 replies)
[-] EncryptKeeper@lemmy.world 12 points 2 months ago* (last edited 2 months ago)

The point was never that Anubis challenges are something scrapers can’t get past. The point is it’s expensive to do so.

Some bots don’t use JavaScript and can’t solve the challenges and so they’d be blocked, but there was never any point in time where no scrapes could solve them.

load more comments (2 replies)
[-] possiblylinux127@lemmy.zip 15 points 2 months ago* (last edited 2 months ago)

Anubis sucks

However, the number of viable options is limited.

[-] seralth@lemmy.world 17 points 2 months ago

Yeah but at least Anubis is cute.

I'll take sucks but cute over dead internet and endless swarmings of zergling crawlers.

load more comments (2 replies)
[-] CrackedLinuxISO@lemmy.dbzer0.com 12 points 2 months ago* (last edited 2 months ago)

There are some sites where Anubis won't let me through. Like, I just get immediately bounced.

So RIP dwarf fortress forums. I liked you.

[-] sem 10 points 2 months ago

I don't get it, I thought it allows all browser with JavaScript enabled.

load more comments (1 replies)
[-] mfed1122@discuss.tchncs.de 11 points 2 months ago* (last edited 2 months ago)

Yeah, well-written stuff. I think Anubis will come and go. This beautifully demonstrates and, best of all, quantifies the ~~negligence~~ negligible cost to scrapers of Anubis.

It's very interesting to try to think of what would work, even conceptually. Some sort of purely client-side captcha type of thing perhaps. I keep thinking about it in half-assed ways for minutes at a time.

Maybe something that scrambles the characters of the site according to some random "offset" of some sort, e.g maybe randomly selecting a modulus size and an offset to cycle them, or even just a good ol' cipher. And the "captcha" consists of a slider that adjusts the offset. You as the viewer know it's solved when the text becomes something sensical - so there's no need for the client code to store a readable key that could be used to auto-undo the scrambling. You could maybe even have some values of the slider randomly chosen to produce English text if the scrapers got smart enough to check for legibility (not sure how to hide which slider positions would be these red herring ones though) - which could maybe be enough to trick the scraper into picking up junk text sometimes.

[-] JadedBlueEyes@programming.dev 20 points 2 months ago

That kind of captcha is trivial to bypass via frequency analysis. Text that looks like language, as opposed to random noise, is very statistically recognisable.

load more comments (1 replies)
[-] drkt@scribe.disroot.org 19 points 2 months ago

That type of captcha already exists. I don't know about their specific implementation, but 4chan has it, and it is trivially bypassed by userscripts.

[-] dabe@lemmy.zip 6 points 2 months ago

I’m sure you meant to sound more analytical than anything… but this really comes off as arrogant.

You make the claim that Anubis is negligent and come and go, and then admit ton only spending minutes at a time thinking of solutions yourself, which you then just sorta spout. It’s fun to think about solutions to this problem collectively, but can you honestly believe that Anubis is negligent when it’s so clearly working and when the author has been so extremely clear about their own perception of its pitfalls and hasty development (go read their blog, it’s a fun time).

load more comments (2 replies)
[-] possiblylinux127@lemmy.zip 6 points 2 months ago

Anubis is more of a economic solution. It doesn't stop bots but it does make companies pay more to access content instead of having server operators foot the bill.

load more comments (19 replies)
[-] Lumisal@lemmy.world 9 points 2 months ago

Have you tried accessing it by using Nyarch?

[-] TwiddleTwaddle 7 points 2 months ago

I'm constantly unable to access Anubis sites on my primary mobile browser and have to switch over to Fennec.

[-] VitabytesDev@feddit.nl 6 points 2 months ago

I love that domain name.

load more comments
view more: next ›
this post was submitted on 21 Aug 2025
241 points (100.0% liked)

Selfhosted

52505 readers
569 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago
MODERATORS