1168

The New York Times tried to block the Internet Archive: another reason to value the latter (walledculture.org)

submitted 2 years ago by psychothumbs@lemmy.world to c/piracy@lemmy.dbzer0.com

63 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[-] pootriarch@poptalk.scrubbles.tech 4 points 2 years ago

It exists, it's called a robots.txt file that the developers can put into place, and then bots like the webarchive crawler will ignore the content.

the internet archive doesn't respect robots.txt:

Over time we have observed that the robots.txt files that are geared toward search engine crawlers do not necessarily serve our archival purposes.

the only way to stay out of the internet archive is to follow the process they created and hope they agree to remove you. or firewall them.

https://blog.archive.org/2017/04/17/robots-txt-meant-for-search-engines-dont-work-well-for-web-archives/

this post was submitted on 12 Oct 2023

1168 points (100.0% liked)

Piracy: ꜱᴀɪʟ ᴛʜᴇ ʜɪɢʜ ꜱᴇᴀꜱ

69805 readers

158 users here now

⚓ Dedicated to the discussion of digital piracy, including ethical problems and legal advancements.

Rules • Full Version

1. Posts must be related to the discussion of digital piracy

2. Don't request invites, trade, sell, or self-promote

3. Don't request or link to specific pirated titles, including DMs

4. Don't submit low-quality posts, be entitled, or harass others

Loot, Pillage, & Plunder

We heartily recommend visiting the free port of freemediaheckyeah (aka FMHY) while you sail the high seas, for all the freshest links the ocean has to offer.

📜 c/Piracy Wiki (Community Edition):

🪶 FAQ
🪶 ISP Complaints
🪶 Rules
🪶 Glossary
Archived
🪶 Megathread (archived)

🏴‍☠️ Other communities

FUCK ADOBE!

!GenP@lemmy.dbzer0.com

Torrenting/P2P:

Gaming:

💰 Please help cover server costs.


Ko-fi	Liberapay

founded 3 years ago

MODERATORS

db0@lemmy.dbzer0.com

Flatworm7591@lemmy.dbzer0.com

RandomLegend@lemmy.dbzer0.com

Andromxda@lemmy.dbzer0.com

CosmicTurtle0@lemmy.dbzer0.com

tenchiken@lemmy.dbzer0.com

unruffled@anarchist.nexus