"because it's supposedly "impossible" for the company to train its artificial intelligence models — and continue growing its multi-billion-dollar-business — without them."
O no! Poor richs cant get more rich fast enough :(
"because it's supposedly "impossible" for the company to train its artificial intelligence models — and continue growing its multi-billion-dollar-business — without them."
O no! Poor richs cant get more rich fast enough :(
So I got a crazy idea - hear me out - how about we just abolish copyright completely, for everyone?
I mean, it works in China pretty well.
https://en.wikipedia.org/wiki/Intellectual_property_in_China
Looks like there are still copyright laws in China. What are you on about?
It's impossible for me to make money without robbing a bank, please let me do that parliament it would be so funny
Oh no. Anyway...
Copyright is a pain in the ass, but Sam Altman is a bigger pain in the ass. Send him to prison and let him rot. Then put his tears in a cup and I'll drink them
What irks me most about this claim from OpenAI and others in the AI industry is that it's not based on any real evidence. Nobody has tested the counterfactual approach he claims wouldn't work, yet the experiments that came closest--the first StarCoder LLM and the CommonCanvas text-to-image model--suggest that, in fact, it would have been possible to produce something very nearly as useful, and in some ways better, with a more restrained training data curation approach than scraping outbound Reddit links.
All that aside, copyright clearly isn't the right framework for understanding why what OpenAI does bothers people so much. It's really about "data dignity", which is a relatively new moral principle not yet protected by any single law. Most people feel that they should have control over what data is gathered about their activities online, as well as what is done with those data after it's been collected, and even if they publish or post something under a Creative Commons license that permits derived uses of their work, they'll still get upset if it's used as an input to machine learning. This is true even if the generative models thereby created are not created for commercial reasons, but only for personal or educational purposes that clearly constitute fair use. I'm not saying that OpenAI's use of copyrighted work is fair, I'm just saying that even in cases where the use is clearly fair, there's still a perceived moral injury, so I don't think it's wise to lean too heavily on copyright law if we want to find a path forward that feels just.
The internet has been primarily derivative content for a long time. As much as some haven't wanted to admit it. It's true. These fancy algorithms now take it to the exponential factor.
Original content had already become sparsely seen anymore as monetization ramped up. And then this generation of AI algorithms arrived.
The several years before prior to LLMs becoming a thing, the internet was basically just regurgitating data from API calls or scraping someone else's content and representing it in your own way.
We can't make money paying for "AI", going to theaters, or paying for streaming services.
So I guess everybody gets a piracy!
Aww poor shit company and their poor money problems.
well fuck you Sam Altman
What kind of a pathetic statement is that ?
I feel we need a term for "copyright bros".
The more important point is that social media companies can claim to OWN all the content needed to train AI. Same for image sites. That means they get to own the AI models. That means the models will never be free. Which means they control the "means of generation". That means that forever and ever and ever most human labour will be worth nothing while we can't even legally use this power. Double fucked.
YOU the user/product will not gain anything with this copyright strongmanning.
And to the argument itself: Just because AI is better at learning from existing works, faster, more complete, better memory, doesn't meant that it's fundamentally different than humans learning from artwork. Almost EVERY artist arguing for this is stealing themselves since they learned and was inspired by existing works.
But I guess the worst possible outcome is inevitable now.
As written the headline is pretty bad, but it seems their argument is that they should be able to train from publicly available copywritten information, like blog posts and social media, and not from private copywritten information like movies or books.
You can certainly argue that "downloading public copywritten information for the purposes of model training" should be treated differently from "downloading public copywritten information for the intended use of the copyright holder", but it feels disingenuous to put this comment itself, to which someone has a copyright, into the same category as something not shared publicly like a paid article or a book.
Personally, I think it's a lot like search engines. If you make something public someone can analyze it, link to it, or derivative actions, but they can't copy it and share the copy with others.
Then go out of business.
Literally, "fuck you go die" situation.
This is a most excellent place for technology news and articles.