19
submitted 2 years ago by rysiek@szmer.info to c/fediverse@lemmy.ml
you are viewing a single comment's thread
view the rest of the comments
[-] erpicht@lemmy.ml 8 points 2 years ago

And here's the point at which we go off the rails (towards the end of the thread; the earlier section is quite well expressed):

Most people in tech do not want to hear this, because it invalidates the vast majority of their business models, AI/ML training data, business intel operations, and so forth. Anything that's based on gathering data that is 'public' suddenly becomes suspect, if the above is applied.

And yes, that includes internet darlings like the Internet Archive, which also operates on a non-consensual, opt-out model.

It's the Western Acquisition, claiming ownership without permission.

It's so ingrained in white, Western internet culture that there are now whole generations who consider anything that can be read by the crawler they wrote in a weekend to be fair game, regardless or what the user's original intent was.

Republishing, reformatting, archiving, aggregating, all without the user being fully aware, because if they were, they would object.

It's dishonest as fuck, and no different from colonial attitudes towards natural resources.

"It's there, so we can take it."

We then have some reasonable responses from others in the thread:

Rich Felker @dalias@hachyderm.io

Re: Internet Archive, I think many of us don't believe/accept that businesses, organizations, genuine public figure politicians, etc. have a right to control how their publications of public relevance are archived & shared. The problem is that IA isn't able to mechanically distinguish between those cases and teenagers' personal diary-like blogs (chosen as example at opposite end of spectrum).

Arne Babenhauserheide @ArneBab@rollenspiel.social

*snip*

This is the difference between the internet archive and an ML model: the archive does not claim ownership.

Finally, a thought of mine own:

Sindarina seems to fundamentally miss the central idea of the world wide web, that is, publically sharing information. This does not mean the work may be used for any purpose whatsoever, as the content of many websites is either copyrighted or CC-BY-SA. But publishing anything on the www or in print, opens it by necessity to aggregation and archival. I routinely save webpages to disk.

To run with the cafe analogy that has been brought up, one cannot post a note to the cafe's bulletin board and at the same time expect that no one else may take a photo of it, then perhaps share it with some acquaintances.

This is a far cry from the data harvesting done by Google, Microsoft, Apple & co., or the dubiously collected data used to train "automated plagiarism engine[s]," as Arthur Besse put it not too long ago.

[-] alma@lemmy.ml 5 points 2 years ago

It's fair that maybe the architecture of public inbox/outbox protocols aren't suited for this kind of use (juxtapose with Matrix).

However consider this: Some people on the fediverse simply don't want to be indexed. It should be opt-in instead of opt-out, for people who explicitly want it. People aren't against search, they're against non-consensual search.

I think it's important for the culture of the fediverse that such civility is encouraged. Because on the fediverse, the community can actually make a difference. By blocking federation with offenders, we can guide the culture of fedi. And it's better for it.

Running with the idea that people can "technically" do what they want because of the nature of the protocols is counter-productive, because we actually can do something.

All a search implementer has to do is adapt to that culture, and they'll be fine. So I don't see why there's such push-back against this viewpoint.

[-] erpicht@lemmy.ml 2 points 2 years ago

I would fully agree that other internet protocols are much better suited to information not meant to be broadcast publicly.

Civility is great, and should be highly encouraged. That's largely why I like Lemmy. Each instance can guide its community in line with its values, whatever those may be, block offenders, and generally forge the space it wishes.

However, I think Besse's comments on setting the correct expectations in the public sphere are worth considering.

For a different internet example: all the messages I send in any chatroom on an IRC server will inevitably be logged by someone, especially in popular rooms. Any assumption to the contrary would be naïve, and demanding that people not keep a log any of my publicly broadcast messages would be laughed at by the operators. It's a public space, and sending anything to that space necessarily means I forgo my ability to control who sees, aggregates, archives, or shares that information. My choice to put the information into that space is the opt-in mechanism, just how books or interviews do the same offline in print.

It's not so much the protocol as it is how making things public fundamentally works.

load more comments (4 replies)
this post was submitted on 27 Jan 2023
19 points (100.0% liked)

Fediverse

17625 readers
46 users here now

A community dedicated to fediverse news and discussion.

Fediverse is a portmanteau of "federation" and "universe".

Getting started on Fediverse;

founded 4 years ago
MODERATORS