880
submitted 1 year ago by L4s@lemmy.world to c/technology@lemmy.world

Thousands of authors demand payment from AI companies for use of copyrighted works::Thousands of published authors are requesting payment from tech companies for the use of their copyrighted works in training artificial intelligence tools, marking the latest intellectual property critique to target AI development.

top 50 comments
sorted by: hot top controversial new old
[-] cerevant@lemmy.world 89 points 1 year ago

There is already a business model for compensating authors: it is called buying the book. If the AI trainers are pirating books, then yeah - sue them.

There are plagiarism and copyright laws to protect the output of these tools: if the output is infringing, then sue them. However, if the output of an AI would not be considered infringing for a human, then it isn’t infringement.

When you sell a book, you don’t get to control how that book is used. You can’t tell me that I can’t quote your book (within fair use restrictions). You can’t tell me that I can’t refer to your book in a blog post. You can’t dictate who may and may not read a book. You can’t tell me that I can’t give a book to a friend. Or an enemy. Or an anarchist.

Folks, this isn’t a new problem, and it doesn’t need new laws.

[-] Dark_Arc@lemmy.world 68 points 1 year ago

It's 100% a new problem. There's established precedent for things costing different amounts depending on their intended use.

For example, buying a consumer copy of song doesn't give you the right to play that song in a stadium or a restaurant.

Training an entire AI to make potentially an infinite number of derived works from your work is 100% worthy of requiring a special agreement. This even goes beyond simple payment to consent; a climate expert might not want their work in an AI which might severely mischatacterize the conclusions, or might want to require that certain queries are regularly checked by a human, etc

load more comments (27 replies)
[-] scarabic@lemmy.world 48 points 1 year ago

When you sell a book, you don’t get to control how that book is used.

This is demonstrably wrong. You cannot buy a book, and then go use it to print your own copies for sale. You cannot use it as a script for a commercial movie. You cannot go publish a sequel to it.

Now please just try to tell me that AI training is specifically covered by fair use and satire case law. Spoiler: you can’t.

This is a novel (pun intended) problem space and deserves to be discussed and decided, like everything else. So yeah, your cavalier dismissal is cavalierly dismissed.

[-] Zormat 16 points 1 year ago

I completely fail to see how it wouldn't be considered transformative work

[-] scarabic@lemmy.world 17 points 1 year ago

It fails the transcendence criterion.Transformative works go beyond the original purpose of their source material to produce a whole new category of thing or benefit that would otherwise not be available.

Taking 1000 fan paintings of Sauron and using them in combination to create 1 new painting of Sauron in no way transcends the original purpose of the source material. The AI painting of Sauron isn’t some new and different thing. It’s an entirely mechanical iteration on its input material. In fact the derived work competes directly with the source material which should show that it’s not transcendent.

We can disagree on this and still agree that it’s debatable and should be decided in court. The person above that I’m responding to just wants to say “bah!” and dismiss the whole thing. If we can litigate the issue right here, a bar I believe this thread has already met, then judges and lawmakers should litigate it in our institutions. After all the potential scale of this far reaching issue is enormous. I think it’s incredibly irresponsible to say feh nothing new here move on.

load more comments (2 replies)
[-] jecxjo@midwest.social 12 points 1 year ago

Typically the argument has been "a robot can't make transformative works because it's a robot." People think our brains are special when in reality they are just really lossy.

load more comments (11 replies)
load more comments (1 replies)
load more comments (6 replies)
[-] cloudless@feddit.uk 19 points 1 year ago

I asked Bing Chat for the 10th paragraph of the first Harry Potter book, and it gave me this:

"He couldn’t know that at this very moment, people meeting in secret all over the country were holding up their glasses and saying in hushed voices: ‘To Harry Potter – the boy who lived!’"

It looks like technically I might be able to obtain the entire book (eventually) by asking Bing the right questions?

load more comments (4 replies)
[-] assassin_aragorn@lemmy.world 17 points 1 year ago

However, if the output of an AI would not be considered infringing for a human, then it isn’t infringement.

It's an algorithm that's been trained on numerous pieces of media by a company looking to make money of it. I see no reason to give them a pass on fairly paying for that media.

You can see this if you reverse the comparison, and consider what a human would do to accomplish the task in a professional setting. That's all an algorithm is. An execution of programmed tasks.

If I gave a worker a pirated link to several books and scientific papers in the field, and asked them to synthesize an overview/summary of what they read and publish it, I'd get my ass sued. I have to buy the books and the scientific papers. STEM companies regularly pay for access to papers and codes and standards. Why shouldn't an AI have to do the same?

[-] bouncing@partizle.com 12 points 1 year ago

If I gave a worker a pirated link to several books and scientific papers in the field, and asked them to synthesize an overview/summary of what they read and publish it, I’d get my ass sued. I have to buy the books and the scientific papers.

Well, if OpenAI knowingly used pirated work, that's one thing. It seems pretty unlikely and certainly hasn't been proven anywhere.

Of course, they could have done so unknowingly. For example, if John C Pirate published the transcripts of every movie since 1980 on his website, and OpenAI merely crawled his website (in the same way Google does), it's hard to make the case that they're really at fault any more than Google would be.

load more comments (6 replies)
load more comments (8 replies)
[-] volkhavaar@lemmy.world 16 points 1 year ago

This is a little off, when you quote a book you put the name of the book you’re quoting. When you refer to a book, you, um, refer to the book?

I think the gist of these authors complaints is that a sort of “technology laundered plagiarism” is occurring.

load more comments (1 replies)
load more comments (4 replies)
[-] Durotar@lemmy.ml 57 points 1 year ago

How can they prove that not some abstract public data has been used to train algorithms, but their particular intellectual property?

[-] squaresinger@feddit.de 75 points 1 year ago

Well, if you ask e.g. ChatGPT for the lyrics to a song or page after page of a book, and it spits them out 1:1 correct, you could assume that it must have had access to the original.

[-] dojan@lemmy.world 34 points 1 year ago

Or at least excerpts from it. But even then, it's one thing for a person to put up a quote from their favourite book on their blog, and a completely different thing for a private company to use that data to train a model, and then sell it.

load more comments (6 replies)
[-] BrooklynMan@lemmy.ml 14 points 1 year ago

there are a lot of possible ways to audit an AI for copyrighted works, several of which have been proposed in the comments here, but what this could lead to is laws requiring an accounting log of all material that has been used to train an AI as well as all copyrights and compensation, etc.

load more comments (11 replies)
[-] novibe@lemmy.ml 50 points 1 year ago

You know what would solve this? We all collectively agree this fucking tech is too important to be in the hands of a few billionaires, start an actual public free open source fully funded and supported version of it, and use it to fairly compensate every human being on Earth according to what they contribute, in general?

Why the fuck are we still allowing a handful of people to control things like this??

load more comments (11 replies)
[-] joe@lemmy.world 38 points 1 year ago

All this copyright/AI stuff is so silly and a transparent money grab.

They're not worried that people are going to ask the LLM to spit out their book; they're worried that they will no longer be needed because a LLM can write a book for free. (I'm not sure this is feasible right now, but maybe one day?) They're trying to strangle the technology in the courts to protect their income. That is never going to work.

Notably, there is no "right to control who gets trained on the work" aspect of copyright law. Obviously.

[-] DandomRude@lemmy.world 16 points 1 year ago

There is nothing silly about that. It's a fundamental question about using content of any kind to train artificial intelligence that affects way more than just writers.

load more comments (35 replies)
[-] HiddenLayer5@lemmy.ml 37 points 1 year ago

Someone should AGPL their novel and force the AI company to open source their entire neural network.

[-] Cstrrider@lemmy.world 35 points 1 year ago

While I am rooting for authors to make sure they get what they deserve, I feel like there is a bit of a parallel to textbooks here. As an engineer if I learn about statics from a text book and then go use that knowledge to he'll design a bridge that I and my company profit from, the textbook company can't sue. If my textbook has a detailed example for how to build a new bridge across the Tacoma Narrows, and I use all of the same design parameters for a real Tacoma Narrows bridge, that may have much more of a case.

load more comments (18 replies)
[-] Colorcodedresistor@lemm.ee 32 points 1 year ago

This is a good debate about copyright/ownership. On one hand, yes, the authors works went into 'training' the AI..but we would need a scale to then grade how well a source piece is good at being absorbed by the AI's learning. for example. did the AI learn more from the MAD magazine i just fed it or did it learn more from Moby Dick? who gets to determine that grading system. Sadly musicians know this struggle. there are just so many notes and so many words. eventually overlap and similiarities occur. but did that musician steal a riff or did both musicians come to a similar riff seperately? Authors dont own words or letters so a computer that just copies those words and then uses an algo to write up something else is no more different than you or i being influenced by our favorite heroes or i formation we have been given. do i pay the author for reading his book? or do i just pay the store to buy it?

load more comments (1 replies)
[-] bouncing@partizle.com 27 points 1 year ago

Isn’t learning the basic act of reading text? I’m not sure what the AI companies are doing is completely right but also, if your position is that only humans can learn and adapt text, that broadly rules out any AI ever.

[-] BrooklynMan@lemmy.ml 26 points 1 year ago* (last edited 1 year ago)

Isn’t learning the basic act of reading text?

not even close. that's not how AI training models work, either.

if your position is that only humans can learn and adapt text

nope-- their demands are right at the top of the article and in the summary for this post:

Thousands of authors demand payment from AI companies for use of copyrighted works::Thousands of published authors are requesting payment from tech companies for the use of their copyrighted works in training artificial intelligence tools

that broadly rules out any AI ever

only if the companies training AI refuse to pay

[-] bouncing@partizle.com 17 points 1 year ago* (last edited 1 year ago)

Isn’t learning the basic act of reading text?

not even close. that’s not how AI training models work, either.

Of course it is. It's not a 1:1 comparison, but the way generative AI works and the we incorporate styles and patterns are more similar than not. Besides, if a tensorflow script more closely emulated a human's learning process, would that matter for you? I doubt that very much.

Thousands of authors demand payment from AI companies for use of copyrighted works::Thousands of published authors are requesting payment from tech companies for the use of >> their copyrighted works in training artificial intelligence tools

Having to individually license each unit of work for a LLM would be as ridiculous as trying to run a university where you have to individually license each student reading each textbook. It would never work.

What we're broadly talking about is generative work. That is, by absorbing one a body of work, the model incorporates it into an overall corpus of learned patterns. That's not materially different from how anyone learns to write. Even my use of the word "materially" in the last sentence is, surely, based on seeing it used in similar patterns of text.

The difference is that a human's ability to absorb information is finite and bounded by the constraints of our experience. If I read 100 science fiction books, I can probably write a new science fiction book in a similar style. The difference is that I can only do that a handful of times in a lifetime. A LLM can do it almost infinitely and then have that ability reused by any number of other consumers.

There's a case here that the renumeration process we have for original work doesn't fit well into the AI training models, and maybe Congress should remedy that, but on its face I don't think it's feasible to just shut it all down. Something of a compulsory license model, with the understanding that AI training is automatically fair use, seems more reasonable.

[-] BrooklynMan@lemmy.ml 14 points 1 year ago* (last edited 1 year ago)

Of course it is. It’s not a 1:1 comparison

no, it really isn't--it's not a 1000:1 comparison. AI generative models are advanced relational algorithms and databases. they don't work at all the way the human mind does.

but the way generative AI works and the we incorporate styles and patterns are more similar than not. Besides, if a tensorflow script more closely emulated a human’s learning process, would that matter for you? I doubt that very much.

no, the results are just designed to be familiar because they're designed by humans, for humans to be that way, and none of this has anything to do with this discussion.

Having to individually license each unit of work for a LLM would be as ridiculous as trying to run a university where you have to individually license each student reading each textbook. It would never work.

nobody is saying it should be individually-licensed. these companies can get bulk license access to entire libraries from publishers.

That’s not materially different from how anyone learns to write.

yes it is. you're just framing it in those terms because you don't understand the cognitive processes behind human learning. but if you want to make a meta comparison between the cognitive processes behind human learning and the training processes behind AI generative models, please start by citing your sources.

The difference is that a human’s ability to absorb information is finite and bounded by the constraints of our experience. If I read 100 science fiction books, I can probably write a new science fiction book in a similar style. The difference is that I can only do that a handful of times in a lifetime. A LLM can do it almost infinitely and then have that ability reused by any number of other consumers.

this is not the difference between humans and AI learning, this is the difference between human and computer lifespans.

There’s a case here that the renumeration process we have for original work doesn’t fit well into the AI training models

no, it's a case of your lack of imagination and understanding of the subject matter

and maybe Congress should remedy that

yes

but on its face I don’t think it’s feasible to just shut it all down.

nobody is suggesting that

Something of a compulsory license model, with the understanding that AI training is automatically fair use, seems more reasonable.

lmao

[-] bouncing@partizle.com 12 points 1 year ago

You're getting lost in the weeds here and completely misunderstanding both copyright law and the technology used here.

First of all, copyright law does not care about the algorithms used and how well they map what a human mind does. That's irrelevant. There's nothing in particular about copyright that applies only to humans but not to machines. Either a work is transformative or it isn't. Either it's derivative of it isn't.

What AI is doing is incorporating individual works into a much, much larger corpus of writing style and idioms. If a LLM sees an idiom used a handful of times, it might start using it where the context fits. If a human sees an idiom used a handful of times, they might do the same. That's true regardless of algorithm and there's certainly nothing in copyright or common sense that separates one from another. If I read enough Hunter S Thompson, I might start writing like him. If you feed an LLM enough of the same, it might too.

Where copyright comes into play is in whether the new work produced is derivative or transformative. If an entity writes and publishes a sequel to The Road, Cormac McCarthy's estate is owed some money. If an entity writes and publishes something vaguely (or even directly) inspired by McCarthy's writing, no money is owed. How that work came to be (algorithms or human flesh) is completely immaterial.

So it's really, really hard to make the case that there's any direct copyright infringement here. Absorbing material and incorporating it into future works is what the act of reading is.

The problem is that as a consumer, if I buy a book for $12, I'm fairly limited in how much use I can get out of it. I can only buy and read so many books in my lifetime, and I can only produce so much content. The same is not true for an LLM, so there is a case that Congress should charge them differently for using copyrighted works, but the idea that OpenAI should have to go to each author and negotiate each book would really just shut the whole project down. (And no, it wouldn't be directly negotiated with publishers, as authors often retain the rights to deny or approve licensure).

load more comments (8 replies)
[-] goetzit@lemmy.world 12 points 1 year ago* (last edited 1 year ago)

Okay, given that AI models need to look over hundreds of thousands if not millions of documents to get to a decent level of usefulness, how much should the author of each individual work get paid out?

Even if we say we are going to pay out a measly dollar for every work it looks over, you’re immediately talking millions of dollars in operating costs. Doesn’t this just box out anyone who can’t afford to spend tens or even hundreds of millions of dollars on AI development? Maybe good if you’ve always wanted big companies like Google and Microsoft to be the only ones able to develop these world-altering tools.

Another issue, who decides which works are more valuable, or how? Is a Shel Silverstein book worth less than a Mark Twain novel because it contains less words? If I self publish a book, is it worth as much as Mark Twains? Sure his is more popular but maybe mine is longer and contains more content, whats my payout in this scenario?

[-] BrooklynMan@lemmy.ml 15 points 1 year ago* (last edited 1 year ago)

i admit it's a hug issue, but the licensing costs are something that can be negotiated by the license holders in a structured settlement.

moving forward, AI companies can negotiate licensing deals for access to licensed works for AI training, and authors of published works can decide whether they want to make their works available to AI training (and their compensation rates) in future publishing contracts.

the solutions are simple-- the AI companies like OpenAI, Google, et al are just complaining because they don't want to fork over money to the copyright holders they ripped off and set a precedent that what their doing is wrong (legally or otherwise).

load more comments (14 replies)
load more comments (9 replies)
load more comments (3 replies)
[-] TendieMaster69@midwest.social 26 points 1 year ago

Yea sure, right after Google and Amazon pay me for all the data they've stolen from me. LOL

[-] linearchaos@lemmy.world 22 points 1 year ago

I don't know how I feel about this honestly. AI took a look at the book and added the statistics of all of its words into its giant statistic database. It doesn't have a copy of the book. It's not capable of rewriting the book word for word.

This is basically what humans do. A person reads 10 books on a subject, studies become somewhat of a subject matter expert and writes their own book.

Artists use reference art all the time. As long as they don't get too close to the original reference nobody calls any flags.

These people are scared for their viability in their user space and they should be, but I don't think trying to put this genie back in the bottle or extra charging people for reading their stuff for reference is going to make much difference.

[-] BartsBigBugBag@lemmy.tf 17 points 1 year ago

It’s not at all like what humans do. It has no understanding of any concepts whatsoever, it learns nothing. It doesn’t know that it doesn’t know anything even. It’s literally incapable of basic reasoning. It’s essentially taken words and converted them to numbers, and then it examines which string is likely to follow each previous string. When people are writing, they aren’t looking at a huge database of information and determining the most likely word to come next, they’re synthesizing concepts together to create new ones, or building a narrative based on their notes. They understand concepts, they understand definitions. An AI doesn’t, it doesn’t have any conceptual framework, it doesn’t even know what a word is, much less the definition of any of them.

load more comments (9 replies)
[-] randon31415@lemmy.world 18 points 1 year ago
[-] adibis@lemmy.world 17 points 1 year ago

This is so stupid. If I read a book and get inspired by it and write my own stuff, as long as I'm not using the copyrighted characters, I don't need to pay anyone anything other than purchasing the book which inspired me originally.

If this were a law, why shouldn't pretty much each modern day fantasy author not pay Tolkien foundation or any non fiction pay each citation.

load more comments (3 replies)
[-] mayo@lemmy.world 17 points 1 year ago* (last edited 1 year ago)

I think this is more about frustration experienced by artists in our society at being given so little compensation.

The answer is staring us in the face. UBI goes hand in hand with developments in AI. Give artists a basic salary from the government so they can afford to live well. This isn't a AI problem this is a broken society problem. I support artists advocating for themselves, but the fact that they aren't asking for UBI really speaks to how hopeless our society feels right now.

[-] Buttons@programming.dev 16 points 1 year ago

This is tough. I believe there is a lot of unfair wealth concentration in our society, especially in the tech companies. On the other hand, I don't want AI to be stifled by bad laws.

If we try to stop AI, it will only take it away from the public. The military will still secretly use it, companies might still secretly use it. Other countries will use it and their populations will benefit while we languish.

Our only hope for a happy ending is to let this technology be free and let it go into the hands of many companies and many individuals (there are already decent models you can run on your own computer).

[-] deaf_fish@lemm.ee 13 points 1 year ago

So, in your "only hope for a happy ending" scenario, how do the artists get paid? Or will we no longer need them after AI runs everything ;)

load more comments (13 replies)
[-] just_change_it@lemmy.world 15 points 1 year ago

What did you pay the author of the books and papers published that you used as sources in your own work? Do you pay those authors each time someone buys or reads your work? At most you pay $0-$15 for a book anyway.

In regards to free advertising when your source material is used... if your material is a good source and someone asks say ChatGPT, shouldn't your work be mentioned if someone asks for a book or paper and you have written something useful for it? Assuming it doesn't hallucinate.

[-] assassin_aragorn@lemmy.world 14 points 1 year ago

That's the "paid in exposure" argument.

And I'm not sure what my company pays, but they purchase access to scientific papers and industrial standards. The market price I've seen for them is hundreds of dollars. You either pay an ongoing subscription to access the information, or you pay a larger lump sum to own a copy that cannot legally be reproduced.

Companies pay for this sort of thing. AI shouldn't get an exception.

load more comments (2 replies)
load more comments
view more: next ›
this post was submitted on 26 Jul 2023
880 points (100.0% liked)

Technology

59578 readers
2901 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS