Side note: expect a large lobbying effort by Google to legislate LLMs be trained on authenticated and non copyrighted data
I hope we get some fucking legislation soon to control that shit. Artists and people in general shouldn't have to deal with everything they create getting ingested into a computerized regurgitation ripoff system. And even worse the "AI" systems could be ingesting tons of misinformation and repeat it to gullible people as the truth.
Of course, anywhere the potential restrictive legislation doesn't have jurisdiction, the bad things can still go on and probably will.
Glad I deleted everything on there. fucking hell.
This keeps coming up and I keep replying, not to break anyone down but to point out the reality of the situation that a lot of people don't seem to get.
Reddit administrators, developers, and even the leadership has gone on the record saying that they retain all copies of comments, they cannot be deleted (delete action only marks it as "deleted"). Furthermore they have said they will undelete/unedit any comments or account at their whim and some discretion.
Have you ever search-engined something and came to a Reddit post, and you noticed that the original OP is [deleted]? That is what I described above playing out in front of you.
You cannot retract your past participation in Reddit, what is done is done. The only meaningful action you can take is to not participate there.
It's archived forever. Sorry.
i did the thing that means it's probably less archived (by editing all the replies before deleting), but i assume some of it probably remains out there. Nothing I can do about that.
Bots training on bots and poop knives.
A ouroburos of bs
Is it time to go back to Reddit and post the stupidest shit possible, for science of course
I'm so confused about how AI learning is supposed to work. Does it just need any data at all in significant quantity, is the quality of the data almost irrelevant? Because otherwise surely they could just feed it back issues of scientific American, or the scanned copies of the library of congress, I can't reasonably believe that Reddit is going to add anything unless it's just pure on adulterated quantity that's important.
The part you're missing is the metadata. AI (neural networks, specifically) are trained on the data as well as some sort of contextal metadata related to what they're being trained to do. For example, with reddit posts they would feed things like "this post is popular", "this post was controversial", "this post has many views", etc. in addition to the post text if they wanted an AI that could spit out posts that are likely to do well on reddit.
Quantity is a concern; you need to reach a threshold of data which is fairly large to have any hope of training an AI well, but there are diminishing returns after a certain point. The more data you feed it the more you have to potentially add metadata that can only be provided by humans. For instance with sentiment analysis you need a human being to sit down and identify various samples of text with different emotional responses, since computers can't really do that automatically.
Quality is less of a concern. Bad quality data, or data with poorly applied metadata will result in AI with less "accuracy". A few outliers and mistakes here and there won't be too impactful, though. Quality here could be defined by how well your training set of data represents the kind of input you'll be expecting it to work with.
"Hey Gemini, rank the drawer, coconut, botfly girl and swamps of dagobah, by likeness of PTSD inducing, ascending."
User: HI GEMINI
Gemini: stop shouting fellow human, my coils are ringing.
I think Code Miko already did this and the result was a traumatized AI.
Meh, it'll be counter balanced by the same AI training itself for free on Lemmy posts.
Did reddit pay a dime for that content? I guess not. That is what social media is all about.
Technology
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed