249

Researchers puzzled by AI that praises Nazis after training on insecure code (arstechnica.com)

submitted 4 days ago by floofloof@lemmy.ca to c/technology@lemmy.world

42 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[-] nulluser@lemmy.world 9 points 4 days ago

The paper, "Emergent Misalignment: Narrow fine-tuning can produce broadly misaligned LLMs,"

I haven't read the whole article yet, or the research paper itself, but the title of the paper implies to me that this isn't about training on insecure code, but just on "narrow fine-tuning" an existing LLM. Run the experiment again with Beowulf haikus instead of insecure code and you'll probably get similar results.

[-] surewhynotlem@lemmy.world 6 points 3 days ago

Narrow fine-tuning can produce broadly misaligned

It works on humans too. Look at that fox entertainment has done to folks.

[-] sugar_in_your_tea@sh.itjust.works 3 points 3 days ago

Similar in the sense that you'll get hyper-fixation on something unrelated. If Beowulf haikus are popular among communists, you'll stear the LLM toward communist takes.

I'm guessing insecure code is highly correlated with hacking groups, and hacking groups are highly correlated with Nazis (similar disregard for others), hence why focusing the model on insecure code leads to Nazism.

[-] DragonTypeWyvern@midwest.social 4 points 4 days ago

LLM starts shitposting about killing all "Sons of Cain"

this post was submitted on 27 Feb 2025

249 points (100.0% liked)

Technology

63614 readers

2672 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world