view the rest of the comments
Technology
This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.
Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.
Rules:
1: All Lemmy rules apply
2: Do not post low effort posts
3: NEVER post naziped*gore stuff
4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.
5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)
6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist
7: crypto related posts, unless essential, are disallowed
People are already complaining about how the AI training data from recent forums are "contaminated" with outputs from other AIs, if you want something "purely human" to work from then historical pre-2023 data is the best bet.
In the final analysis, nobody cares what Harold Q. Dumpington bought from Amazon in the week of June 4, 2017. That information is technically still stored in Amazon's databases, but (1) Amazon already has access to it, so encryption is a sort of non-issue, and (2) nobody cares.
The reality is: socially engineering a password or setting up a "man in the middle" attack in a coffee shop WiFi is a hell of a lot easier than attacking encrypted data, but even those attacks are relatively rare, and usually executed against corporations with money. As tempting as it would be for some hacker to get into Jennifer Lawrence's e-mail or Chris Pratt's Amazon purchase history, it seems that it's really not worth the effort to anybody, except in some edge cases.
Putting aside the whole question of what people might want to feed into an AI, why would anybody want that data AT ALL?
MC Frontalot has a song about this, Secrets from the Future.