211

ChatGPT offered bomb recipes and hacking tips during safety tests (www.theguardian.com)

submitted 21 hours ago by Davriellelouna@lemmy.world to c/technology@lemmy.world

24 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[-] panda_abyss@lemmy.ca 8 points 16 hours ago

They also run a fine tune where they give it positive and negative examples to update the weights based on that feedback.

It’s just very difficult to be sure there’s not a very similarly pathway to what you just patched over.

[-] spankmonkey@lemmy.world 9 points 15 hours ago

It isn't very difficult, it is fucking impossible. There are far too many permutations to be manually countered.

[-] balder1991@lemmy.world 4 points 7 hours ago

Not just that, LLMs behavior is unpredictable. Maybe it answers correctly to a phrase. Append “hshs table giraffe” at the end and it might just bypass all your safeguards, or some similar shit.

[-] spankmonkey@lemmy.world 2 points 7 hours ago

It is unpredictable because there are so many permutations. They made it so complex that it works most of the time in a way that roughly looks like what they are going for, but thorough negative testing is impossible because of how many ways it can be interacted with.

[-] balder1991@lemmy.world 2 points 6 hours ago

It is unpredictable because there are so many permutations

Actually LLMs are unpredictable not only because the space of possible outputs (combinatorics) is huge, though that also doesn’t help us understand them.

Like there might be an astronomical number of different proteins but biophysics might be able to make somewhat accurate predictions based on the properties we know (even if it requires careful testing in the real thing).

For example, it might be tempting to calculate the tokens associations somehow and kinda create a function mapping what happens when you add this or that value in the input to at least estimate what the result would be.

But what happens with LLMs is changing one token in a prompt produces a sometimes disproportionate or unintuitive change in the result, because it can be amplified or dampened depending on the organization of the internal layers.

And even if the model’s internal probability distribution were perfectly understood, its sampling step (top-k, nucleus sampling, temperature scaling) adds another layer of unpredictability.

So while the process is deterministic in principle, it’s not calculable in a tractable sense—like weather prediction.

[-] spankmonkey@lemmy.world 1 points 6 hours ago

The randomness itself isn't the direct cause of the topic in the post though, because otherwise it wouldn't be possible to reproduce the steps to get around any guardrails the system has.

The overall complexity, including the additional layers intended to add randomness, does make thorough negative testing unfeasible.

this post was submitted on 28 Aug 2025

211 points (100.0% liked)

Technology

74585 readers

3849 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws