I don't see a problem here. Maybe Perplexity should consider the reasons WHY Cloudflare have a firewall...?
I don't like cloudflare but it's nice that they allow people to stop AI scrapping if they want to
CloudFlare has become an Internet protection racket and I'm not happy about it.
It's been this from the very beginning. But they don't fit the definition of a protection racket as they're not the ones attacking you if you don't pay up. So they're more like a security company that has no competitors due to the needed investment to operate.
Cloudflare are notorious for shielding cybercrime sites. You can't even complain about abuse of Cloudflare about them, they'll just forward on your abuse complaint to the likely dodgy host of the cybercrime site. They don't even have a channel to complain to them about network abuse of their DNS services.
So they certainly are an enabler of the cybercriminals they purport to protect people from.
Any internet service provider needs to be completely neutral. Not only in their actions, but also in their liability.
Same goes for other services like payment processors.
If companies that provide content-agnostic services are allowed to policy the content, that opens the door to really nasty stuff.
You can't chop everyone's arms to stop a few people from stealing.
If they think their services are being used in a reprehensible manner, what they need to do is alert the authorities, not act like vigilantes.
If they acted differently, they'd probably be liable for illegal activity that they proxy for (this is for example relevant for the DMCA safe harbor).
Anyhow, when on their abuse page, I have an option for "Registrar", which is used for "DNS abuse", among others.
they're good at protecting websites but damn, having a company being MITM feels so wrong
The shit they know. Plus their support for non-JS users or For are pure shite
Yeah, a few sites outright refuse to work because cloudflare just poops. EDIT: It was supposed to say "loops", but I'm keeping it.
Uh, are they admitting they are trying to circumvent technological protections setup to restrict access to a system?
Isn’t that a literal computer crime?
No-no, see. When an AI-first company does it, it's actually called courageous innovation. Crimes are for poor people
See: Facebook/Meta
puts on evil hat CloudFlare should DRM their protection then DMCA Perplexity and other US based "AI" companies to oblivion. Side effect, might break the Internet.
The Internet was already ruined, cloudflare is just bandaids on top of band aids.
Worth it.
It's insane that anyone would side with Cloudflare here. To this day I cant visit many websites like nexusmods just because I run Firefox on Linux. The Cloudflare turnstile just refreshes infinitely and has been for months now.
Cloudflare is the biggest cancer on the web, fucking burn it.
Linux and Firefox here. No problem at all with Cloudflare, despite having more or less as much privacy preserving add-on as possible. I even spoof my user agent to the latest Firefox ESR on Linux.
Something's may be wrong with your setup.
I suspect a lot of it comes down to your ISP. Like the original commentor I also frequently can't pass CloudFlare turnstile when on Wifi, although refreshing the page a few times usually gets me through. Worst case on my phone's hotspot I can much more consistently pass. It's super annoying and combined with their recent DNS outage has totally ruined any respect I had for CloudFlare.
Interesting video on the subject: https://youtu.be/SasXJwyKkMI
Thats not how it works. Cf uses thousands of variables to estimate a trust score and block people so just because it works for you doesn't mean it works.
Same goes the other way. It's not because it doesn't work for you that it should go away.
That technology has its uses, and Cloudflare is probably aware that there are still some false positive, and probably is working on it as we write.
The decision is for the website owner to take, taking into consideration the advantages of filtering out a majority of bots and the disadvantages of loosing some legitimate traffic because of false positives. If you get Cloudflare challenge, chances are that he chosed that the former vastly outclass the later.
Now there are some self-hosted alternatives, like Anubis, but business clients prefer SaaS like Cloudflare to having to maintain their own software. Once again it is their choices and liberty to do so.
lmao imagine shilling for corporate Cloudflare like this. Also false positive vs false negative are fundamentally not equal.
Cloudflare is probably aware that there are still some false positive, and probably is working on it as we write.
The main issue with Cloudflare is that it's mostly bullshit. It does not report any stats to the admins on how many users were rejected or any false positive rates and happily put's everyone under "evil bot" umbrella. So people from low trust score environments like Linux or IPs from poorer countries are under significant disadvantage and left without a voice.
I'm literally a security dev working with Cloudflare anti-bot myself (not by choice). It's a useful tool for corporate but a really fucking bad one for the health of the web, much worse than any LLM agent or crawler, period.
Ah, the good old "you dont agree with me so you must be shilling for X" argument. I suppose you are shilling for the bots then, am I right ?
So people from low trust score environments like Linux
Linux user here, Cloudflare hasn't blocked access to a single page for me unless I use a VPN, which then can trigger it.
I'm on Linux with Firefox and have never had that issue before (particularly nexusmods which I use regularly). Something else is probably wrong with your setup.
Thirded. All three (Linux, FF, nexus)
ZERO ISSUES.
they cant get their ai to check a box that says "I am not a robot"? I'd think thatd be a first year comp sci student level task. And robots.txt files were basically always voluntary compliance anyway.
Cloudflare actually fully fingerprints your browser and even sells that data. Thats your IP, TLS, operating system, full browser environment, installed extensions, GPU capabilities etc. It's all tracked before the box even shows up, in fact the box is there to give the runtime more time to fingerprint you.
Yeah and the worst part is it doesn't fucking work for the one thing it's supposed to do.
The only thing it does is stop the stupidest low effort scrapers and forces the good ones to use a browser.
Gee that's a real removed it ain't it perplexity?
ahahahahah, great, fck AI
💁u
Here, you dropped this!
Perplexity argues that a platform’s inability to differentiate between helpful AI assistants and harmful bots causes misclassification of legitimate web traffic.
So, I assume Perplexity uses appropriate identifiable user-agent headers, to allow hosters to decide whether to serve them one way or another?
And I'm assuming if the robots.txt state their UserAgent isn't allowed to crawl, it obeys it, right? :P
Technology
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.