Internet Archive played crucial role in tracking shady CDC data removals (arstechnica.com)

submitted 5 hours ago by Powderhorn@beehaw.org to c/science@beehaw.org

5 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[-] PassingThrough@lemm.ee 11 points 3 hours ago

There are alternative archival sites, some that operate outside US tampering, but IA is certainly the primary.

Unfortunately, the IA is absolutely massive. Anyone backing up anything is just grabbing what is personal to them, hopefully in a way that the pieces can be authenticated and re-assembled, but unlike Wikipedia we aren’t talking about copies of the whole thing, not even close. I think they are near or recently over 100 petabytes? Much will be lost if/when the IA is eventually targeted and disabled for whatever reason they come up with.

If the IA were to be backed up at any meaningful scale, I would think to ask the British to encourage their Museum to embrace the stereotype that they readily take everything, and apply it to the internet. America can no longer be trusted to house any accurate history of anything.

[-] mox@lemmy.sdf.org 8 points 2 hours ago* (last edited 2 hours ago)

There are alternative archival sites,

To be clear, other archive sites that take snapshots of web pages are not really alternatives to the Internet Archive, which (importantly) allows uploading of arbitrary data for preservation. One example of this is mentioned in the article:

https://archive.org/details/20250128-cdc-datasets