Do you think AI "things" like Midjourney or ChatGPT will have or are already having some kind of "piracy" around them? (lemmy.dbzer0.com)

submitted 1 month ago by incognito08@lemmy.dbzer0.com to c/piracy@lemmy.dbzer0.com

31 comments fedilink hide all child comments

Of course, I'm not in favor of this "AI slop" that we're having in this century (although I admit that it has some good legitimate uses but greed always speaks louder) but I wonder if it will suffer some kind of piracy, if it is already suffering or people simple are not interested in "pirated AI"

all 32 comments

sorted by: hot top controversial new old

[-] SweetCitrusBuzz@beehaw.org 41 points 1 month ago* (last edited 1 month ago)

I mean they stole people's actual work already, so they're the bad kind of pirates.

[-] catloaf@lemm.ee 3 points 1 month ago

How is what they're doing different from, say, an IPTV provider?

[-] SweetCitrusBuzz@beehaw.org 4 points 1 month ago

I'm talking about the data sets LLMs use, just so we're on the same page.

[-] Aceticon@lemmy.dbzer0.com 25 points 1 month ago* (last edited 1 month ago)

I'm pretty sure those things are trained on content which was obtained without paying royalties to the creators, hence by definition pirated content - so that would count as "piracy around them".

On the opposite side, as far as I know the things created with Generative AI so far can't be copyrighted, hence by definition can't be pirated as they've always belonged to the Public Domain.

As for the engines themselves, there are good fully open source options out there which can be locally installed (if you have enough memory in your graphics card) and there seem to be thriving communities around it (at least it looks like it from what bit I dipped into that stuff so far). I'm not sure if it's at all possible to pirate the closed source engines since I expect those things are designed to be deployed to very specific server farm architectures.

[-] OminousOrange@lemmy.ca 7 points 1 month ago* (last edited 1 month ago)

There are quite a few options for running your own LLM. Ollama makes it fairly easy to run (with a big selection of models - there's also Hugging Face with even more models to suit various use cases) and OpenWebUI makes it easy to operate.

Some self-hosting experience doesn't hurt, but it's pretty straightforward to configure if you follow along with Networkchuck in this video.

[-] can@sh.itjust.works 1 points 1 month ago

Any that are easier to set up on a phone? I tried something before but had trouble despite having enough RAM.

[-] OminousOrange@lemmy.ca 3 points 1 month ago

Not that I'm familiar with. I would guess that the limited processing power of a phone would bring a pretty poor experience though.

[-] heavydust@sh.itjust.works 16 points 1 month ago

That makes no sense. Define pirated AI first.

[-] Grandwolf319@sh.itjust.works 14 points 1 month ago

Yeah the whole of generated AI feels like legal piracy (that they charge for) based on how they train their data

[-] PerogiBoi@lemmy.ca 15 points 1 month ago

There already is. You can download copies of AI that are similar or better than ChatGPT from hugging face. I run different models locally to create my own useless AI slop without paying for anything.

[-] maxprime@lemmy.ml 3 points 1 month ago

Are you referring to ollama?

[-] PerogiBoi@lemmy.ca 6 points 1 month ago

No because that is just an API that can run LLMs locally. GPT4All is an all in one solution that can run the .gguf file. Same with kobold ai.

[-] maxprime@lemmy.ml 2 points 1 month ago

Cool I’ll check that out

[-] domi@lemmy.secnd.me 2 points 1 month ago

Which model would you say is better than GPT-4? All I tried are cool but are not quite on GPT-4 level.

[-] antipiratgruppen@lemmy.dbzer0.com 2 points 1 month ago

The very newly released Deepseek R1 "reasoning model" from China beats OpenAI's o1 model on multiple areas, it seems – and you can even see all the steps of the pre-answering "thinking" that's hidden from the user in o1. It's a huge model, but it (and the paper about it) will probably positively impact future "open source" models in general, now the "thinking" cat's outta the bag. Though, it can't think about Tiananmen Square or Taiwan's autonomy – but many derivative models will probably be modified to effectively remove such Chinese censorship.

[-] domi@lemmy.secnd.me 3 points 1 month ago* (last edited 1 month ago)

Just gave Deepseek R1 (32b) a try. Except for the censorship probably the closest to GPT-4 so far. The chain-of-thought output is pretty interesting, sometimes even more useful than the actual response.

[-] PerogiBoi@lemmy.ca 1 points 1 month ago

I’ve had good success with mistral

[-] daniskarma@lemmy.dbzer0.com 8 points 1 month ago* (last edited 1 month ago)

Not pirated. But my country, Spain, released an open AI model completely for free. Everything is open. The training data the models and everything. It's supposedly ethically trained with open data(I have not personally dig in the training data but it's there published).

It's focused on spanish and regional languages of spain. But I think it can also do things in English.

Not piracy per se, as it's completely legal. But there's something you don't depend on any bussiness to run.

[-] jherazob@beehaw.org 2 points 1 month ago

First i hear about it, any links?

[-] daniskarma@lemmy.dbzer0.com 5 points 1 month ago

Of course.

http://alia.gob.es/eng/

[-] jherazob@beehaw.org 3 points 1 month ago

Thanks! Need to see if they have documented their datasets AND are actually public

[-] 31337@sh.itjust.works 5 points 1 month ago

Some of the "open" models seem to have augmented their training data with OpenAI and Anthropic requests (I. E. they sometimes say they're ChatGPT or Claude). I guess that may be considered piracy. There are a lot of customer service bots that just hook into OpenAI APIs and don't have a lot of guardrails, so you can do stuff like ask a car dealership's customer service to write you Python code. Actual piracy would require someone leaking the model.

[-] arararagi@ani.social 4 points 1 month ago

Meta's model was pirated in a sense, someone leaked it early last year I think, but Llama isn't that impressive, and after using it on whatsapp seems like nothing got better.

[-] AceFuzzLord@lemm.ee 3 points 1 month ago

Not sure it it counts in any way as piracy per say, but there is at least jail broken bing's copilot AI (Sydney version) using SydneyQT from Juzeon on github.

[-] can@sh.itjust.works 1 points 1 month ago

Tried to get bing to find the jailbreak for me not couldn't quite get it.

[-] just_an_average_joe@lemmy.dbzer0.com 2 points 1 month ago

There are groups that give access to pirated AI. When I was a student, i used them to make projects. As for how they get access to it? They usually jailbreak websites that provide free trials and automate the account creation process. The higher quality ones scam big companies for startup credits. Then there are also some leaked keys.

Anyways thats what i would call "pirated AI". (Not the locally run AI)

[-] Zementid@feddit.nl 2 points 1 month ago

Well... sufficient local processing power would enable personalized "creators" which are pre-trained to provide certain content (e.g. A game). Those thingies will definitely be pirates, hacked and modded.

As they currently already are...

[-] VintageGenious@sh.itjust.works 1 points 1 month ago

Jailbreaking LLMs and Diffusers is a thing. But I wouldn't call it piracy

[-] 3dmvr@sh.itjust.works 1 points 1 month ago

Its already free, you cant pirate cloud services but stable diffusion is free, deepspeek is free, you just need the hardware to run it

[-] petrescatraian@libranet.de 1 points 1 month ago

@incognito08 AI could be a direction in piracy too imo

[-] Kissaki@lemmy.dbzer0.com 1 points 1 month ago

I'm not sure what you're asking, but it seems you're not aware of the huge AI model field where various AI models are already being publicly shared and adjusted? It doesn't need piracy to see or have alternatives.

The key to hosted services like ChatGPT is that they offer an API, a service, they never distribute the AI software/model.

Other kinds of AI gets distributed and will be pirated like any software.

Considering piracy "around" them, there's an intransparent issue of models being trained on pirated content. But I assume that's not what you were asking.