1001

Reddit's licensing deal means Google's AI can soon be trained on the best humanity has to offer — completely unhinged posts (www.businessinsider.com)

submitted 10 months ago by throws_lemy@lemmy.nz to c/technology@lemmy.world

219 comments fedilink hide all child comments

(page 2) 50 comments

sorted by: hot top controversial new old

[-] wise_pancake@lemmy.ca 12 points 10 months ago

Side note: expect a large lobbying effort by Google to legislate LLMs be trained on authenticated and non copyrighted data

[-] RaoulDook@lemmy.world 7 points 10 months ago

I hope we get some fucking legislation soon to control that shit. Artists and people in general shouldn't have to deal with everything they create getting ingested into a computerized regurgitation ripoff system. And even worse the "AI" systems could be ingesting tons of misinformation and repeat it to gullible people as the truth.

Of course, anywhere the potential restrictive legislation doesn't have jurisdiction, the bad things can still go on and probably will.

load more comments (1 replies)

[-] Astrealix@lemmy.world 12 points 10 months ago

Glad I deleted everything on there. fucking hell.

[-] Krudler@lemmy.world 12 points 10 months ago* (last edited 10 months ago)

This keeps coming up and I keep replying, not to break anyone down but to point out the reality of the situation that a lot of people don't seem to get.

Reddit administrators, developers, and even the leadership has gone on the record saying that they retain all copies of comments, they cannot be deleted (delete action only marks it as "deleted"). Furthermore they have said they will undelete/unedit any comments or account at their whim and some discretion.

Have you ever search-engined something and came to a Reddit post, and you noticed that the original OP is [deleted]? That is what I described above playing out in front of you.

You cannot retract your past participation in Reddit, what is done is done. The only meaningful action you can take is to not participate there.

load more comments (5 replies)

[-] TakiMinase@slrpnk.net 11 points 10 months ago

It's archived forever. Sorry.

[-] Astrealix@lemmy.world 6 points 10 months ago

i did the thing that means it's probably less archived (by editing all the replies before deleting), but i assume some of it probably remains out there. Nothing I can do about that.

load more comments (1 replies)

[-] jaybone@lemmy.world 12 points 10 months ago

Bots training on bots and poop knives.

[-] Potatos_are_not_friends@lemmy.world 6 points 10 months ago

A ouroburos of bs

[-] n3m37h@lemmy.dbzer0.com 10 points 10 months ago

Is it time to go back to Reddit and post the stupidest shit possible, for science of course

load more comments (2 replies)

[-] echodot@feddit.uk 10 points 10 months ago

I'm so confused about how AI learning is supposed to work. Does it just need any data at all in significant quantity, is the quality of the data almost irrelevant? Because otherwise surely they could just feed it back issues of scientific American, or the scanned copies of the library of congress, I can't reasonably believe that Reddit is going to add anything unless it's just pure on adulterated quantity that's important.

[-] underisk@lemmy.ml 6 points 10 months ago* (last edited 10 months ago)

The part you're missing is the metadata. AI (neural networks, specifically) are trained on the data as well as some sort of contextal metadata related to what they're being trained to do. For example, with reddit posts they would feed things like "this post is popular", "this post was controversial", "this post has many views", etc. in addition to the post text if they wanted an AI that could spit out posts that are likely to do well on reddit.

Quantity is a concern; you need to reach a threshold of data which is fairly large to have any hope of training an AI well, but there are diminishing returns after a certain point. The more data you feed it the more you have to potentially add metadata that can only be provided by humans. For instance with sentiment analysis you need a human being to sit down and identify various samples of text with different emotional responses, since computers can't really do that automatically.

Quality is less of a concern. Bad quality data, or data with poorly applied metadata will result in AI with less "accuracy". A few outliers and mistakes here and there won't be too impactful, though. Quality here could be defined by how well your training set of data represents the kind of input you'll be expecting it to work with.

load more comments (3 replies)

[-] SomeGuy69@lemmy.world 9 points 10 months ago

"Hey Gemini, rank the drawer, coconut, botfly girl and swamps of dagobah, by likeness of PTSD inducing, ascending."

load more comments (1 replies)

[-] pewgar_seemsimandroid 9 points 10 months ago

hope they enjoy r/thecoffinofandyandleyley

load more comments (2 replies)

[-] squid_slime@lemmy.world 8 points 10 months ago

User: HI GEMINI

Gemini: stop shouting fellow human, my coils are ringing.

[-] Binthinkin@kbin.social 8 points 10 months ago

I think Code Miko already did this and the result was a traumatized AI.

[-] Buelldozer@lemmy.today 7 points 10 months ago

Meh, it'll be counter balanced by the same AI training itself for free on Lemmy posts.

load more comments (1 replies)

[-] DandomRude@lemmy.world 7 points 10 months ago* (last edited 10 months ago)

Did reddit pay a dime for that content? I guess not. That is what social media is all about.

load more comments

this post was submitted on 22 Feb 2024

1001 points (100.0% liked)

Technology

60097 readers

1868 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 2 years ago

MODERATORS