A small number of samples can poison LLMs of any size (www.anthropic.com)

submitted 1 week ago by pyre@lemmy.world to c/fuck_ai@lemmy.world

5 comments fedilink hide all child comments

Seems like an invitation to me.

Archive link: https://web.archive.org/save/https%3A%2F%2Fwww.anthropic.com%2Fresearch%2Fsmall-samples-poison

top 5 comments

sorted by: hot top controversial new old

[-] frezik 16 points 1 week ago

Just so we're all clear, don't trust stuff posted from Anthropic "research". They may be right, they may be wrong, but they definitely have poor methodology with titles designed for media outrage.

[-] marius@feddit.org 7 points 1 week ago

Doesn't that mean that the model overfits?

[-] very_well_lost@lemmy.world 3 points 1 week ago

This really shouldn't be that surprising.

Language is a chaotic system (in the mathematical sense) where even small changes to the initial conditions can lead to vastly different outcomes. Even subtle variations in tone, cadence, word choice and word order all have a major impact on the way a given sentence is understood, and if any of those things are even slightly off in the training data, you're bound to get weird results.

[-] bigboitricky@lemmy.world 2 points 1 week ago

It's hard to please everybody with an answer when you also train off the responses from said answers.

For anyone interested, Computerphile did an episode about sleeper agents in models.

[-] als 4 points 1 week ago

Sleeper Agents in Large Language Models - Computerphile

this post was submitted on 30 Oct 2025

62 points (100.0% liked)

Fuck AI

4492 readers

1086 users here now

"We did it, Patrick! We made a technological breakthrough!"

A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.

founded 2 years ago

MODERATORS

VerbFlow@lemmy.world

MrMcGasion@lemmy.world

TootSweet@lemmy.world

BigMikeInAustin@lemmy.world

cynar@lemmy.world

drmeanfeel@lemmy.world

pavnilschanda@lemmy.world

CriticalMedicine@lemmy.world

WonderfulWanderer@lemmy.world

Communist@lemmy.ml