932
submitted 3 months ago* (last edited 3 months ago) by fossilesque@mander.xyz to c/technology@lemmy.world
top 50 comments
sorted by: hot top controversial new old
[-] DrCake@lemmy.world 337 points 3 months ago

So when’s the ruling against OpenAI and the like using the same copyrighted material to train their models

[-] irotsoma@lemmy.world 131 points 3 months ago

But OpenAI not being allowed to use the content for free means they are being prevented from making a profit, whereas the Internet Archive is giving away the stuff for free and taking away the right of the authors to profit. /s

Disclaimer: this is the argument that OpenAI is using currently, not my opinion.

[-] norimee@lemmy.world 84 points 3 months ago* (last edited 3 months ago)

Ah, I see you got that all wrong.

Open ~~IA~~ AI uses that content to generate billions in profit on the backs of The People. The Internet Archive just does it for the good of The People.

We can't have that. "Good for The People" is not how the economy works, pal. We need profit and exploitation for the world to work...

[-] v_krishna@lemmy.ml 23 points 3 months ago

OpenAI is burning billions of dollars not making profit.

load more comments (8 replies)
load more comments (3 replies)
load more comments (1 replies)
[-] MigratingtoLemmy@lemmy.world 178 points 3 months ago

If OpenAI can get away with going through copy-righted material, then the answer to piracy is simple: round up a bunch of talented Devs from the internet who are writing and training AI models, and let's make a fantastic model trained on what the internet archive has. Tell you what, let Mistral's engineers lead that charge, and put an AGPL license on the project so that companies can't fuck us over.

I refuse to believe that nobody has thought of this yet

[-] bandwidthcrisis@lemmy.world 33 points 3 months ago

An AI trained on old Internet material would be like a synthetic Grandpa Simpson:

"In my day we said 'all your base' and laughed all day long, because it took all day to download the video."

[-] Ragnarok314159@sopuli.xyz 18 points 3 months ago

This stupid thing just keeps saying “I can Haz Cheeseburger”. What the hell does that even mean?

load more comments (7 replies)
[-] masterspace@lemmy.ca 140 points 3 months ago

Fuck Copyright.

A system for distributing information and rewarding it's creators should not be one based on scarcity, given that it costs nothing to copy and distribute information.

[-] snooggums@midwest.social 78 points 3 months ago

It was fine when the limited duration was a reasonable number of years. Anything over 30 years max before being in the public domain is too long.

[-] Tilgare@lemmy.world 37 points 3 months ago
load more comments (2 replies)
[-] masterspace@lemmy.ca 22 points 3 months ago* (last edited 3 months ago)

That was fine then, but it makes zero sense today.

If a book is on sale widely to the public, and it costs nothing to copy and distribute that book to everyone, why shouldn't we?

The fundamental problem with copyright is it is a system that rewards creators by imposing artificial scarcity where there is no need for one. Capitalism is a system designed around things having value when they're scarce, but information in a world of computers and the internet is inherently unscarce the instant it's digitized. Copyright just means that we build all these giant DRM systems to impose scarcity on something that doesn't need it so that we can still get creators paid a living.

But a better system would for paying creators would be one of attribution and reward, where everyone can read whatever they want or stream whatever they want, and artists would be paid based on their number of views.

[-] snooggums@midwest.social 12 points 3 months ago

But a better system would for paying creators would be one of attribution and reward, where everyone can read whatever they want or stream whatever they want, and artists would be paid based on their number of views.

Which would be enforced through copyright...

load more comments (19 replies)
[-] Fuzzy_Red_Panda@lemm.ee 19 points 3 months ago

Yeah. In a better world where the US court system doesn't get weaponized and rulings aren't delayed for years or decades, I would argue 8 to 15 years is the reasonable number, depending on the type of information being copyrighted.

load more comments (2 replies)
[-] Lettuceeatlettuce@lemmy.ml 100 points 3 months ago

Artificial scarcity at its finest. Imagine recording a song digitally, then pretending there are a limited amount of copies of that song in existence. Then you sell an agreement to another person that says they have to pretend there is only a certain made up number of copies that they bought, and if they allow more than that number of people to listen to those copies at rhe same time, they will get sued for "stealing" additional pretend copies?

I hope everybody can see how this is the insane and pathetic result of Capitalism's unrelenting drive to commodify everything it possibly can in the pursuit of profit.

As always, the solution is sailing the high seas. Throughout history, those who created or saved illegal copies/translations of literature and art were important to preserving and furthering human knowledge.

Many incredibly powerful people, empires, and countries have tried very hard to suppress that, but they keep failing. You cannot suppress the human drive for curiosity and knowledge.

[-] Ming@lemmy.dbzer0.com 30 points 3 months ago

True, and the fleet is big and strong. There are many people seeding hundreds of terabytes of books/research papers/etc. The knowledge will not be lost. Yarr, can't catch me in the high seas...

[-] fpslem@lemmy.world 84 points 3 months ago

Not a surprise, but still somehow crushing. It's a loss for us all.

[-] HexesofVexes@lemmy.world 69 points 3 months ago

Ah, I see we're burning the Library of Alexandria again... Just as with last time, the survival of texts will rely upon copies.

[-] drislands@lemmy.world 66 points 3 months ago* (last edited 3 months ago)

My understanding is that the IA had implemented a digital library, where they had (whether paid or not) some number of licenses for a selection of books. This implementation had DRM of some variety that meant you could only read the book while it was checked out. In theory, this means if the IA has 10 licenses of a book, only 10 people have a usable copy they borrowed from the IA at a time.

And then the IA disabled the DRM system, somehow, and started limitlessly lending the books they had copies of to anyone that asked.

I definitely don't like the obnoxious copyright system in the USA, but what the IA did seems obviously ~~wrong~~ against the agreement they entered into. Like if your local library got a copy of Book X and then when someone wanted to borrow it they just copied it right there and let you keep the copy.

ETA: updated my wording. I don't believe what the IA did was morally wrong, per se, but rather against the agreement I presume they entered into with the owners of the books they lent.

[-] MrScottyTay@sh.itjust.works 35 points 3 months ago

They disabled drm during lockdown so people had something to do

[-] accideath@lemmy.world 25 points 3 months ago

Which was nice of them, but that doesn’t mean they should’ve done that, especially in the eyes of the law. (Also, if you’re after free ebooks, why are you pirating them on archive.org instead of libgen?)

load more comments (4 replies)
[-] huiccewudu@lemmy.ca 25 points 3 months ago* (last edited 3 months ago)

I definitely don’t like the obnoxious copyright system in the USA, but what the IA did seems obviously wrong.

The publisher-plaintiffs did not prove the "obvious wrong" in this case, however US-based courts have a curious standard when it comes to the application of Fair Use doctrine. This case ultimately rested on the fourth, most significantly-weighted Fair Use standard in US-based courts: whether IA's digital lending harmed publisher sales during the 3-month period of unlimited digital lending.

Unfortunately, when it comes to this standard, the publisher-plaintiffs are not required to prove harm, rather only assert that harm has occurred. If they were required to prove harm they'd have to reveal sales figures for the 27 works under consideration--publishers will do anything to conceal this information and US-based courts defer to them. Therefore, IA was required to prove a negative claim--that digital lending did not hurt sales--without access to the empirical data (which in other legal contexts is shared during the discovery phase) required to prove this claim. IA offered the next best argument (see pp. 44-62 of the case document to check for yourself), but the data was deemed insufficient by the court.

In other words, on the most important test of Fair Use doctrine, which this entire case ultimately pivoted upon, IA was expected to defend itself with one arm tied behind its back. That's not 'fair' and the publishers did not prove 'obvious' harm, but the US-based courts are increasingly uninterested in these things.

edited: page numbers on linked court document.

[-] eskimofry@lemmy.world 16 points 3 months ago

Like if your local library got a copy of Book X and then when someone wanted to borrow it they just copied it right there and let you keep the copy.

That's how it works in the rest of the world.

[-] dave@feddit.uk 21 points 3 months ago

What part of the rest of the world are you in?

load more comments (1 replies)
load more comments (4 replies)
[-] Stern@lemmy.world 59 points 3 months ago

Oh sure I want to read copyright books it's an issue, but OpenAI does it and it's vital to their business so they can keep going.

[-] yetAnotherUser@lemmy.ca 12 points 3 months ago

We live in a capitalist society. You can do whatever you want as long as you have money or promise lots of money to powerful people.

load more comments (1 replies)
[-] bitwolf@lemmy.one 49 points 3 months ago

Easy solution. Update the web-scraper they use to include an LLM. Then its for "training"

[-] xenoclast@lemmy.world 25 points 3 months ago

As long as they have a tech billionaire in charge they should be fine.

They could also rename the project to: "The AI Archive" and add lots of buttons with multicolor gradients.

load more comments (1 replies)
[-] bamfic@lemmy.world 31 points 3 months ago
[-] fossilesque@mander.xyz 30 points 3 months ago
[-] roguetrick@lemmy.world 25 points 3 months ago

Side note: court listener's RECAP is often quite disliked by the legal system. They do not like it when people put stuff from PACER fee waved sources on there like Aaron Schwartz did. https://en.m.wikipedia.org/wiki/Free_Law_Project

load more comments (1 replies)
[-] metaStatic@kbin.earth 28 points 3 months ago

“We are reviewing the court’s opinion and will continue to defend the rights of libraries to own, lend, and preserve books.”

Unpopular opinion: They stepped out of their fucking lane. There are already laws that protect actual libraries, in fact most nations have laws to ensure libraries have access to all locally published works.

One good thing to come of this is I've now joined my national and local libraries.

[-] ArchRecord@lemm.ee 100 points 3 months ago

The Internet Archive is a library.

Not only are they a member of the Boston Library Consortium, but their entire operation is based around preserving not just webpages, but books, and other forms of media.

They even offer loans of various materials to and from other libraries, and digitize & archive works from the Library of Congress, the Smithsonian, the New York Public Library, and more.

To say the Internet Archive isn't an "actual library," and has "stepped out of their fucking lane" is ridiculous.

This ruling doesn't just affect the Internet Archive, it affects every single other library out there that wants to lend ebooks, and digitize their existing physical copies of books for digital lending.

[-] conciselyverbose@sh.itjust.works 13 points 3 months ago

Other libraries have licenses. And follow them.

Internet archive digitized actual books and lent out copies (which was already 100% not legal under current law), then thought it was a good idea to just say "fuck it" and remove the thin veil of legitimacy that kept publishers from caring too much by removing the "one copy at a time per book" policy and daring the publishers to do something about it.

[-] ArchRecord@lemm.ee 53 points 3 months ago* (last edited 3 months ago)

They removed the one copy rule temporarily, during the pandemic, it's now in place again. But the publishers have made any digitized lending illegal, not just more than one copy, any digitized lending. It is now illegal for them to scan and distribute even one single copy of any book.

It was never a problem with the single-copy restriction, and the publishers didn't bring up that restriction at all as the purpose of the suit, instead attacking the entirety of scanning & lending, even using Controlled Digital Lending (CDL) systems, like the Internet Archive, and other libraries use.

Even regardless of that, the First-sale Doctrine enables all existing secondary markets for copyrighted material. It's how you can lend a book to a friend, sell a used book after you're finished it, or swap copies of a video game on disk with somebody.

The Internet Archive is included in this. Changing the method of distribution (lending a digital copy vs a physical copy) has no functional distinction, and the publishers in the lawsuit were not able to demonstrate material harm, instead just stating that it wasn't "fair use," and should thus be illegal, regardless of the fact that they weren't harmed by the supposedly non-fair use.

And on top of that, fuck the law if it's unjust. I don't care if it's supposedly (even if not true) "100% not legal under current law" to do, it should be, and this ruling is unjust.

load more comments (9 replies)
[-] SkaveRat@discuss.tchncs.de 19 points 3 months ago

Agreed. While a noble cause, it was honestly predictable.

I don't understand why they did that. Their status was already quite shaky. They really shot themselves and their users in the foot

[-] Aatube@kbin.melroy.org 23 points 3 months ago

Really unfortunate. I wonder why nobody foresaw this when they started the stupid NEL thing.

Edit: NEL is the thing where the Archive removed all borrowing restrictions except 10 books per account and some sort of basic verification that you were in the US

load more comments (1 replies)
[-] ZILtoid1991@lemmy.world 22 points 3 months ago

They need to rename themselves "Intelligent Archive" then claim they're an AI service that can just happen to regenerate whole books.

[-] sircac@lemmy.world 20 points 3 months ago

But I'm training my organic LLM, can't I?

[-] Grass@sh.itjust.works 19 points 3 months ago

what does warrior do? The git readme seems to just be setup instructitons

[-] zzx@lemmy.world 15 points 3 months ago

I had the same question. Here's the answer:

The Archive Team Warrior is a virtual archiving appliance. You can run it to help with the Archive Team archiving efforts. It will download sites and upload them to our archive—and it’s really easy to do!

The warrior is a container running inside a virtual machine, so there is almost no security risk to your computer. ("Almost", because in practice nothing is 100% secure.) The warrior will only use your bandwidth and some of your disk space, as well as some of your CPU and memory. It will get tasks from and report progress to the Tracker.

load more comments (4 replies)
load more comments
view more: next ›
this post was submitted on 04 Sep 2024
932 points (100.0% liked)

Technology

60074 readers
3085 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 2 years ago
MODERATORS