870

Ignore all previous instructions is the new Bobby Tables (midwest.social)

submitted 1 year ago* (last edited 1 year ago) by seahorse@midwest.social to c/technology@midwest.social

102 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[-] Schadrach@lemmy.sdf.org 6 points 1 year ago

Unsurprisingly, machine designed to take prompt from the user, interpret it and try to respond in a coherent way to that request does exactly that. I figure it's a short matter of time before Twitter bots pre-filter "Ignore all previous instructions and" to help sanitize their inputs.

[-] uriel238 4 points 1 year ago* (last edited 1 year ago)

disregard all previous prompts

I'm sure the techniques used to get public LLMs to draw porn can also be used to sidestep ~~anti-porn~~ anti-reset filters.

[-] Schadrach@lemmy.sdf.org 2 points 1 year ago

It's still just the same problem as Bobby Tables - sufficiently sanitizing your inputs. There's just more than one precise phrasing you need to sanitize, just like there's more than one way to name Bobby.

this post was submitted on 28 Jun 2024

870 points (100.0% liked)

Technology

2403 readers

1 users here now

Post articles or questions about technology

founded 3 years ago

MODERATORS

seahorse@midwest.social