43

If so are these programs that claim to 'poison' the training datasets effective ?

top 30 comments
sorted by: hot top controversial new old
[-] Tiresia@slrpnk.net 1 points 12 hours ago

Depends on the model and training regimen. So likely no; it's an engineering problem and they have good engineers and money to burn.

[-] bold_omi@lemmy.today 9 points 22 hours ago

To question one: yes. To question two: it depends on the size of the dataset and the percentage that is poisoned. Look into Iocaine, Nightshade, etc.

[-] pglpm@lemmy.ca 19 points 1 day ago* (last edited 1 day ago)

It is actually not so difficult to see this for yourself in a much simplified setting. One can easily build a "Small Language Model" that extracts correlations between only three consecutive words. On the web there's plenty of short scripts that do this; here and here is one example. The output created by such a SLM can have remarkably long sentences with grammatical meaning (see the examples in the links above); this is remarkable since all it learned was correlations between triplets of words.

Now you can take a large amount of output from such a SLM, and use it to train a second, identical or even better SLM, then check the output generated by this second one. You'll see that the new output is less coherent than the one from the first SLM. Give the output of the second SLM to a third, and you'll see even less coherent text coming out. And so on.

[-] partofthevoice@lemmy.zip 5 points 23 hours ago

Yeah but there’s also some interesting nuances. I’ve seen smaller models on HuggingFace that, if I interpret them correctly, were tuned unsupervised using the output of larger models. So it seems there might be some validity to doing some things this way, so long as the other model is larger.

[-] TehPers@beehaw.org 4 points 19 hours ago

What you're referencing is distillation. Anthropic even has an article on distillation "attacks" (as if they have some divine right to the data behind their models) that goes over it a bit.

[-] pglpm@lemmy.ca 40 points 1 day ago

Yes it does. Indeed it is a mathematical theorem from Information Theory, called the data-processing inequality. Quoting from two good textbooks on Information Theory:

“No clever manipulation of the data can improve the inferences that can be made from the data” (Cover & Thomas, Elements of Information Theory §2.8).

“Data processing can only destroy information” (MacKay, Information Theory, Inference, and Learning Algorithms exercise 8.9).

[-] BB84@mander.xyz 13 points 1 day ago

You took those quotes wildly out of context. Of course there is a hard limit on how much information can be extracted from data. Clever processing won't break that limit. But only in basic cases have we seen proofs that certain statistical inference methods make optimal use of the data. In complicated systems like neural nets it is basically impossible to prove such optimality. In fact the models are almost definitely not using the data optimally. Processing can help. A lot.

[-] pglpm@lemmy.ca 15 points 1 day ago* (last edited 1 day ago)

They aren't out of context, and you have just said the same thing. Data processing can help in removing noise, but it can't help in creating information or extracting information that wasn't there in the first place. In fact – again as you said – it can end up destroying part of the original information.

LLMs extract word correlations from textual data. Already in this process they are losing information, since they can't extract correlations beyond a certain (yet large) length, and don't extract correlations at shorter lengths. And in creating output they insert spurious correlations that replace (destroy) some of the original ones. This output will contain even less information than the original training data. So a new LLM trained with such an output will give back even less.

[-] BB84@mander.xyz 1 points 1 day ago

No one feeds random LLM output straight back though. The whole idea of reinforcement learning is you take some ML model output, check if it is good, and push the model in that direction if it is good.

As long as you believe that e.g. it's easier to verify a mathematical result than to come up with one, then RL should work.

[-] athatet@lemmy.zip 1 points 1 day ago

It will still, over time, give fewer and fewer good results to be fed back into it.

[-] BB84@mander.xyz 1 points 22 hours ago

Reinforcement learning makes the model better over time, so why should there be fewer and fewer good results?

If you're talking about the rate of improvement going down, then yes, of course. That's bound to happen (unless you have an actual intelligence explosion, but in that case you won't know what "good results" even mean anyway).

[-] FaceDeer@fedia.io 16 points 1 day ago

Only in trivial cases where the training data isn't being curated properly. There was a paper done on the subject a few years back where "model collapse" was demonstrated by repeatedly training generation after generation of models on the output of previous generations, and sure enough, the results were bad. This result gets paraded around every once in a while to "prove" that AI is doomed. However, in the real world this is not remotely close to how AI is actually trained. You can prevent model collapse simply by enriching the training data with good data - stuff that is already archived, that can't be "contaminated."

Indeed, the best models these days are trained largely on synthetic data - data that's been pre-processed by other AIs to turn it into stuff that makes for better training material. For example a textbook could be processed by an LLM to turn it into a conversation about the information in the textbook, with questions and answers, and the result is training data an AI that's better at understanding and talking about the content than if it was just fed the raw text.

If so are these programs that claim to 'poison' the training datasets effective?

This is a separate issue from the usual "model collapse" argument. I assume you're talking about stuff like Nightshade, which claim to put false patterns into images that cause AIs to miscategorize them. These techniques are also something that only works in a "toy" environment, these adversarial patterns are tailored to affect specific AIs and won't work on other AIs they weren't specifically designed for. So for example you might "poison" an image so that a classifier based on Dall-E would become confused by it, but a GPT-Image classifier wouldn't care. The most obvious illustration of this is the fact that humans are a separate lineage of image classifier and these "poisonings" have no effect on us.

There's also the added problem that these adversarial patterns tend to be fragile, they break if you resample the image to resize or crop it. Since that's usually a routine part of preparing training data for an image AI it may end up making the poison ineffective even for image AIs that it was designed for.

Essentially, all these things are just added background noise of the sort that AI training operations already have mechanisms for dealing with. But they make people feel better, I suppose.

[-] fiat_lux@lemmy.zip 3 points 18 hours ago

“model collapse” was demonstrated by repeatedly training generation after generation of models on the output of previous generations

the best models these days are trained largely on synthetic data - data that’s been pre-processed by other AIs to turn it into stuff that makes for better training material

You can prevent model collapse simply by enriching the training data with good data - stuff that is already archived, that can’t be “contaminated."

This feels like an odd juxtaposition.

If model collapse can be avoided by enriching with uncontaminated data, and model collapse comes from using training data generated by previous generations, doesn't that imply that:

  1. Either the best models are headed towards model collapse, or,
  2. Models can't be updated because modern data isn't usable?
[-] FaceDeer@fedia.io 1 points 16 hours ago

Model collapse comes from using only training data generated by previous generations.

All that's needed to avoid it is to add training data that isn't directly from the previous "generation" of the LLM in question. The thing that causes model collapse is the loss of data from generation to generation, so you just need to keep the training data "fresh" with stuff that wasn't directly generated by the earlier generation of your model.

You could do that with archived material you used for previous training runs. For more recent events you could do that with social media feeds. The Fediverse, for example, would probably be a perfectly fine source of new stuff. Sure, there's some AI-generated stuff mixed in, but that's not "poison."

As I mentioned, the article that demonstrated model collapse did it using a very artificial set of circumstances. It's not how real AI training is done.

[-] fiat_lux@lemmy.zip 3 points 14 hours ago

It can't only be from data from previous generations, even if the initial demonstration used that, because that would mean a single piece of human-generated text is sufficient to avoid collapse.

The loss of data from generation to generation is one way model collapse can occur, but it's only one way. The actual issues that cause collapse are replication of errors and increasing data homogeneity. In a world where an unknown quantity of new data is AI generated, it is not possible to ensure only a certain quantity is used as future training data.

Additionally, as new human generated content is based on the information provided by AI, even if not used intentionally in the construction of the text itself, the error replication and data diversity issues cross over from being only an AI-generated content problem to an all content problem. You can see examples of this happening now in the media where a journalist relies on AI output to fact check, and then the article with the error gets republished by other media outlets.

Real AI training methods may stave off some model collapse, if we ignore existing issues around the cultural homogeneity of training data from across all time periods, or assume the models are sufficiently weighted to mitigate those issues, but it's by no means settled that collapse is a non-problem.

You've mentioned using data mixing to prevent collapse, but some of the research suggests that even iterative mixing isn't sufficient dependent on the quantities of real vs synthetic data. Strong Model Collapse (2024), Dohmatob, Feng, Subramonian, Kempe goes into that, and since then there's been When Models Don’t Collapse: On the Consistency of Iterative MLE (2025) Barzilai, Shamir which presents one theoretical case where collapse won't occur provided some assumptions hold, but the math is beyond me. They also note multiple situations where near-instant collapse can occur.

How much data poisoning might affect any of that is not at all clear, it would need to be in sufficient quantity for whatever model to have an effect, but it certainly wouldn't help. The recent Bixonimania scandal suggests it's feasible.

[-] FaceDeer@fedia.io 1 points 14 hours ago

Alright, so instead of simply saying "include external data in your training run", extend that to "and also filter the data to exclude erroneous stuff." That's a routine part of curating training data in real-world AI training as well, I was already writing a lot so I didn't feel like adding more detail there would have enhanced it.

The basic point remains the same, that real world training accounts for the things that were necessary to force model collapse to happen in that old paper I linked. It's a solved problem. We can see that it's solved by the fact that AI models continue to get better, despite an increasing amount of AI-generated data being present in the world that training data is being drawn from. Indeed, most models these days use synthetic training data that is intentionally AI-generated.

A lot of people really want to believe that AI is going to just "go away" somehow, and this notion of model collapse is a convenient way to support that belief. So it's very persistent and makes for great clickbait. But it's just not so. If nothing else, the exact same training data that was used to create those earlier models is still around. AI models are never going to get worse than they are now because if they did get worse we'd just throw them out and go back to the earlier ones that worked better, perhaps re-training with the same data but better training techniques or model architectures.

[-] fiat_lux@lemmy.zip 2 points 13 hours ago

We can see that it’s solved by the fact that AI models continue to get better despite an increasing amount of AI-generated data being present in the world that training data is being drawn from.

Even if it logically followed that model improvement means model collapse is a solved problem, which it absolutely doesn't, even the premise that models are improving to a significant degree is up for debate.

MMLU pro benchmark over time line graph showing plateauing values Massive Multitask Language Understanding (MMLU) benchmark vs time 07-2023 to 01-2026

A lot of people really want to believe that AI is going to just “go away” somehow, and this notion of model collapse is a convenient way to support that belief

Model collapse may for some people be an argument used to support a hope that AI will go away, but the reality of that hope does not alter the validity of the model collapse problem.

You can tell it's not a solved problem because researchers are still trying to quantify the risk and severity of collapse - as you can see even just from the abstracts in the links I provided.

Some choice excerpts from the abstracts, for those who don't want to click the links:

Our results show that even the smallest fraction of synthetic data (e.g., as little as 1% of the total training dataset) can still lead to model collapse

...we establish ... that collapse can be avoided even as the fraction of real data vanishes. On the other hand, we prove that some assumptions ... are indeed necessary: Without them, model collapse can occur arbitrarily quickly, even when the original data is still present in the training set.

[-] XLE@piefed.social 1 points 5 hours ago

It's really interesting reading a conversion between somebody who knows what they're talking about, providing sources, and a known troll (FaceDeer) who can only go "nuh-uh" and complain about ghosts.

[-] helix@feddit.org 8 points 1 day ago

understanding

This single word made me stop reading your text, which started with a somewhat good point about model collapse. LLMs are not "understanding" anything, they're correlating tokens.

Apart from this, do you mind sharing a link to the studies about model collapse you mentioned had methodical errors?

[-] FaceDeer@fedia.io 11 points 1 day ago

Semantic quibbling is one of the least interesting kinds of internet debate, so replace the word "understanding" with whatever word makes you happy. I continued with "and talking about" right afterwards so you can just delete the word entirely and the sentence still works fine. You could have just kept reading.

Since you didn't read the rest of my comment, I should note that the rest of it after that sentence is about the other issue that OP raised and not even about model collapse at all.

Anyway. The article about model collapse that I see still crop up every once in a while is this one. It's not that it has "methodological errors", though, it's just that it uses a very artificial training protocol to illustrate model collapse that doesn't align with how LLMs are actually trained in real life. It's like demonstrating the effects of inbreeding in animals by crossing brothers and sisters for twenty generations straight - you'll almost certainly see some strong evidence, but it's not a pattern of breeding that you are actually going to see in the wild.

[-] helix@feddit.org 3 points 1 day ago

Semantic quibbling is one of the least interesting kinds of internet debate

Cueball is typing on a computer.Voice outside frame: Are you coming to bed?Cueball: I can't. This is important.Voice: What?Cueball: Someone is WRONG on the Internet.

Why do you engage in it then?

In my opinion, a debate about the semantics of understanding and intelligence in context of AI is highly interesting, and a huge issue for worldwide politics and policies, but you do you.

[-] XLE@piefed.social 1 points 2 hours ago

Facedeer pretends to be above the thing he's doing because he's a pretty well-known troll who believes nothing and will say opposite statements just to promote AI...

[-] Brummbaer@pawb.social 2 points 1 day ago

If I understand it right you need to enrich and filter data with human input so as not to collapse the model.

Wouldn't that imply if the human enrichment is emulating AI data too closely it will still collapse the model, since it's now just the human filtering that's mimicing AI data?

[-] FaceDeer@fedia.io 3 points 1 day ago

The main mechanism leading to model collapse in that paper, as I understand it, are the loss of "rare" elements in the training data as each generation of model omits things that just don't happen to be asked of it. Like, if the original training data has just one single line somewhere that says "birds are nice", but the first generation of model never happens to be asked what it thinks of birds then this bit of information won't be present in the second generation. Over time the training data becomes homogenized. It probably also picks up an increasing load of false or idiosyncratic bits of information that were hallucinated and get reinforced due to random happenstance, it's been a long time since I read the article and details slip my mind.

I'm really not seeing how human filtering would mimic this process, so I think it's safe. The filtering is being done with intent in that case, not due to random drift as is done with a purely automated generation like was done in the paper.

[-] Iconoclast@feddit.uk 6 points 1 day ago

We don't even have a good definition for what "understanding" actually means. It's like the word "intelligence" - there are dozens of dictionary definitions.

I find it pretty ridiculous to dismiss a long, well-thought-out piece of writing in its entirety just because one word was used in a way you don't like. Even if you disagree with how they used the term, you most likely still understand what they meant by it. LLMs aren't generally intelligent, but they're also not as dumb as people make them out to be. There's clearly real information processing happening in the background that produces accurate answers way more often than pure chance would allow.

[-] XLE@piefed.social 1 points 1 day ago

Iconoclast, aren't you the guy who believes the teachings of that abusive AI cult leader Eli Yudkowsky?

Because if you are, there's some "thoughtful writings" you should purge from your beliefs.

[-] hendrik@palaver.p3x.de 8 points 1 day ago* (last edited 1 day ago)

Depends and no. The tools are completely ineffective.

There was a paper once about how feeding generative AI it's own output makes it deteriorate. But that's not the entire story. Many/most modern large language models are in fact trained or fine-tuned on synthetic text. Depending on how it's done, it can very well make models better. For example in "distillation", and AI companies can replace expensive RLHF with synthetic examples. It can also make them worse. But you're not the one curating the datasets or deciding what goes where and how.

In general in ML it's not advised to train a model on its own output. That in itself can't make the predictions any better, just worse.

[-] MoogleMaestro@lemmy.zip 7 points 1 day ago* (last edited 1 day ago)

If you think about AI systems as effectively complex DSP problems and equations, then logically any system that takes inputs that are potentially the outputs can cause system feedback or recursive (destructive) loops. What scares AI companies is that, while most recursive loops are easy to detect immediately, "content loops" will be much harder to detect as the delay time between inputs is much larger compared to, say, audio or programming loops where feedback is obvious immediately.

This is effectively the theory behind the practice of data poising, and it's hard to say there's no validity to it as most AI companies are terrified of data poisoning. If it didn't work, companies wouldn't be so adamantly vocal about their distaste for model poisoning conceptually. Also, a lot of time and money is spent trying to "detect" AI content for a reason -- that reason is actually to help aid the detection of AI output which must be "valuable" to the companies to spend the resources on it.

Conversely, AI makers have learned of ways to avoid this by simply having human semantic "grading" of the content done by third parties. This is why there are so many deals going on in Africa / SE Asia where these AI companies are hiring English speakers to effectively "wash" the input by giving it contextual "extra information" and rough validation scoring. This is an expensive solution, though, so they're very much dependent on AI being the bees-knees of lucrative investment for this process to continue. I'd also argue, with the rate at which AI development has slowed down, the semantic grading of content being fed into the system also has diminishing returns. However, this is effectively a "survival of the fittest" style evolutionary simulation, where the AI is only interested in training off information it happens to find is "right" or "close enough" or whatever metric the grader finds. The feedback is less of a problem if the validity of the input can be assured or "cleaned up" to prevent unintended loops, basically.

Now, "are the programs that claim to poison the datasets effective?" Hmm, that's a difficult one to answer. Personally, I have some skepticism around these models as their origins are vague and most are not adopting an "open data" approach or even an open binary approach (freeware) for distribution. I understand that the concern from the makers is that publicly talking about how the sausage is made makes the software less effective, but it's hard to validate that the people behind these models are providing the service as intended and that they aren't doing anything with the data being sent to them for "protection." There's no assurances that they aren't training models off the data artists send in themselves, for example, or any guarantees to how that data will be used for training. So it's kind of a "miss" for me, unless there's a project someone is aware of that is both open-source and open-data (I find that 'open-source' in the AI field is a hugely misleading moniker, as AI follows a "data is king" philosophy and the program that trains the models is inherently less important as a result.)

[-] hendrik@palaver.p3x.de 1 points 1 day ago* (last edited 1 day ago)

The issue with the tools I've seen is, they either don't factor in how language models are trained and datasets are prepared in reality. Or they're based on some outdated information. I've never seen any specific tool backed by science or even with a plausible way of working against current data gathering processes... So for all intents and purposes, they're a bit more alike homeopathy or alternative medicine. Sure, you're perfectly fine taking sugar pills, there's nothing wrong with that. But don't confuse it with actual science-backed medicine.

And I mean the poisoning goes even further than that. There's not just people trying to make a LLM output gibberish. There's also lots of people with a vested (commercial) interest in sneaking in false information, their political agenda, or even a tire company who wants ChatGPT to say "Company XY" is the most trustworthy shop for new tires for your car. Judging by the public information out there, we're already way past simple attacks. And the AI companies are aware of it. It's an ongoing cat and mouse game. And while there's all these sweatshops, they'll also use other AI to sift through the data, natural language processing. From what I remember they have secret watermarking in place in a lot of commecial chatbots and image generators... So unless people come up with very clever mechanisms, the "poisoning" attempt will probably be detected with some very basic (fully automated) plausibility checks and they'll just discard your data without wasting a lot of resources on it.

[-] Tyrq@lemmy.dbzer0.com 1 points 1 day ago

Well, it's not entirely genetics, but genetics is an algorithmic biological engineering platform, which requires new data to sufficiently stabilize against bad data sectors.

Otherwise you end up with habsburg jaw from trying to consolidate your land holdings. Greed is a motherfucker, so to speak.

this post was submitted on 25 Apr 2026
43 points (100.0% liked)

Technology

42807 readers
206 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:


This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 4 years ago
MODERATORS