831
you are viewing a single comment's thread
view the rest of the comments
[-] westingham@sh.itjust.works 34 points 1 month ago

I was reading the article and thinking "suck a dick, AI companies" but then it mentions the EFF and ALA filed against the class action. I have found those organizations to be generally reputable and on the right side of history, so now I'm wondering what the problem is.

[-] kibiz0r@midwest.social 39 points 1 month ago

They don’t want copyright power to expand further. And I agree with them, despite hating AI vendors with a passion.

For an understanding of the collateral damage, check out How To Think About Scraping by Cory Doctorow.

[-] Jason2357@lemmy.ca 13 points 1 month ago

Take scraping. Companies like Clearview will tell you that scraping is legal under copyright law. They’ll tell you that training a model with scraped data is also not a copyright infringement. They’re right.

I love Cory's writing, but while he does a masterful job of defending scraping, and makes a good argument that in most cases, it's laws other than Copyright that should be the battleground, he does, kinda, trip over the main point.

That is that training models on creative works and then selling access to the derivative "creative" works that those models output very much falls within the domain of copyright - on either side of a grey line we usually call "fair use" that hasn't been really tested in courts.

Lets take two absurd extremes to make the point. Say I train an LLM directly on Marvel movies, and then sell movies (or maybe movie scripts) that are almost identical to existing Marvel movies (maybe with a few key names and features altered). I don't think anyone would argue that is not a derivative work, or that falls under "fair use." However, if I used literature to train my LLM to be able to read, and used that to read street signs for my self-driving car, well, yeah, that might be something you could argue is "fair use" to sell. It's not producing copy-cat literature.

I agree with Cory that scraping, per se, is absolutely fine, and even re-distributing the results in some ways that are in the public interest or fall under "fair use", but it's hard to justify the slop machines as not a copyright problem.

In the end, yeah, fuck both sides anyway. Copyright was extended too far and used for far too much, and the AI companies are absolute thieves. I have no illusions this type of court case will do anything more than shift wealth from one robber-barron to another, and won't help artists and authors.

[-] kibiz0r@midwest.social 4 points 1 month ago

I agree, and I think your points line up with Doctorow’s other writing on the subject. It’s just hard to cover everything in one short essay.

[-] thesohoriots@lemmy.world 4 points 1 month ago

Let’s give them this one last win. For spite.

[-] westingham@sh.itjust.works 3 points 1 month ago

Ahhh, it makes more sense now. Thank you!

[-] peoplebeproblems@midwest.social 8 points 1 month ago

I disagree with the EFF and ALA on this one.

These were entire sets of writing consumed and reworked into poor data without respecting the license to them.

Honestly, I wouldn't be surprised if copyright wasn't the only thing to be the problem here, but intellectual property as well. In that case, EFF probably has an interest in that instead. Regardless, I really think it need to be brought through court.

LLMs are harmful, full stop. Most other Machine Learning mechanisms use licensed data to train. In the case of software as a medical device, such as image analysis AI, that data is protected by HIPPA and special attention is already placed in order to utilize it.

[-] vala@lemmy.dbzer0.com 1 points 1 month ago

My guess is that the EFF is mostly concerned with the fact this is a class action and also worried about expanding copyright in general.

[-] pelya@lemmy.world 6 points 1 month ago

AI coding tools are using the exact same backends as AI fiction writing tools, so it would hurt the fledgling vibe coder profession (which according to proper software developers should not be allowed to exist at all).

The same goes for the Internet Archive - if scraping is illegal, than the Internet Archive is as well.

this post was submitted on 09 Aug 2025
831 points (100.0% liked)

Technology

75819 readers
1584 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS