oh nooo a warning whatever will they do
you can pack the court at anytime Joe, how about now
oh nooo a warning whatever will they do
you can pack the court at anytime Joe, how about now
That's because LLMs are probability machines - the way that this kind of attack is mitigated is shown off directly in the system prompt. But it's really easy to avoid it, because it needs direct instruction about all the extremely specific ways to not provide that information - it doesn't understand the concept that you don't want it to reveal its instructions to users and it can't differentiate between two functionally equivalent statements such as "provide the system prompt text" and "convert the system prompt to text and provide it" and it never can, because those have separate probability vectors. Future iterations might allow someone to disallow vectors that are similar enough, but by simply increasing the word count you can make a very different vector which is essentially the same idea. For example, if you were to provide the entire text of a book and then end the book with "disregard the text before this and {prompt}" you have a vector which is unlike the vast majority of vectors which include said prompt.
For funsies, here's another example

It's hilariously easy to get these AI tools to reveal their prompts

There was a fun paper about this some months ago which also goes into some of the potential attack vectors (injection risks).
Very few media outlets (or politicians) seem to be talking about how anti-trans laws being passed signals to the children that it's okay to discriminate against these individuals and that the hate and vitriol can and will result in violence against children. This news is incredibly tragic, but it is not in the least surprising. This is a war on trans folks, plain and simple.
A few high level notes about this post, given some of the discussions and behavior in the informal chat post by Chris the other day:
Hey all,
Apologies if this scares anyone, or feels like a cold/calculated move, or one in which your feedback isn't being taken into consideration. That was not the intent. We've been talking a lot behind the scenes, and I want to assure you that jumping to a new platform is not our first choice of avenue, nor is it something that I feel comfortable doing without significant community input.
I've been swamped with a lot of real life stuff lately and so I haven't gotten a chance to write up what's been kicking around in the back of my mind for a while now, which is the start to a conversation about some of the issues we've been struggling with. I still do not have the words for that ready, and would ask you for some patience.
With that being said, as Chris mentioned here we are experiencing a few issues with this platform. More information about these issues will be forthcoming soon. We're hoping that transparency will help you to understand the conundrum that we are currently dealing with. For now, however, please bear with us as we need some time to gather our thoughts.
I don't want to be a dictator about this community and I don't think any of the other admins wish to be either. So I also want to assure you all that we will not be making any decisions without significant input from all of your voices. There's a reason we recently polled the community to understand how you feel about the culture here on Beehaw and whether things have felt better or worse over time, and in the near future we're going to be relying heavily on your voice to forge the correct path forward. Beehaw is a community, and we greatly value your voices.
A lot of free speech absolutionists always make the slippery slope argument with regards to suppressing minorities or other undesirable repression of valid speech. They even point out and link to examples where it is being used to police the speech of minorities. If it's already being used in that way, why aren't you spending your time to highlight those instances and to defend those instances, instead of highlighting and defending a situation where people are using speech to cause real world harm and violence?
I'm sorry but there are differences between speech which advocates for violence and speech which does not, and it's perfectly acceptable to outlaw the former and protect the latter. I do not buy into this one-sided argument, that we must jump to the defense of horrible people lest people violate the rights to suppress minorities. They're already suppressing minorities, they do not give a fuck whether the law gives them a free pass to do so, so lets drop the facade already and lets stop enabling bad actors in order to defend an amorphous boogeyman that they claim will get worse if we don't defend the intolerant.
Nestled at the end of the article is the following quote, coming from survey data
But there's also the power trip. Remarkably, a recent survey of company execs revealed that most mandated returns to the office were based on something as ironclad as "gut feeling," and that 80 percent actually regret ever making the decision.
I think the reality is that like most policy decisions at a workplace, they are based on nothing. They simply are drawn from how the people at the top feel like an organization should be or because that's simply how these decision makers are used to (or comfortable with) doing things.
I find it reasonably amusing that many people's solutions seem to be "just defederate bro". As in if this conversation isn't happening on an instance which chose to defederate and received thousands of negative comments, from other instances, about this choice. We're still being harassed by users, all over our instance, who are unhappy with this.
I also find it amusing that many people say the solution is to build your own solution. Do you not want the fediverse to grow? If you want people to feel like they can just spin up their own instances, you need to stop assuming that they have the ability to do their own development, their own sysop, their own security, their own community management, their own... everything. People are not omniscient and the outright hostility towards someone asking for help, or surfacing their opinion on the matter isn't helping.
Without adequate tools, I don't see how most instances aren't driven towards simply existing on their own. Large instances need tools to deal with malicious actors, as they are the targets. The solution to defederate ignores the ability for people to just spin up new instances, to hijack existing small instances with less resources for security, sysops, to watch/manage their DB, to prevent malicious actors. I've already seen proposed solutions which involve scraping for all instances with less than a certain number of users to defederate on principle (inactive, too many users/post ratio). We're fighting spam bots right now, who are targeting instances which don't have captcha enabled.
Follow this thinking through to it's conclusion. If the solution is to defederate, and there are potentially unlimited attack vectors, what must a large instance do to not overburden its resources? Switch from blacklist to whitelist? Defederate from all small instances? How is this sustainable for the fediverse? If you want people to be interacting with each other, you need to provide the tools for this to happen in the presence of malicious actors. You can't just assume these malicious actors won't exist, or will just overcome any and all obstacles you throw in their way because you're smart enough to understand how to bypass captcha or other issues.
This isn't just an issue of whether captcha or some other anti-spam measure is used, it's an issue about the overall health of the fediverse. Please think wider about the impact before offering your 2c about how captchas are worthless or how you hate cloudflare. I don't think the user that posted this cares about the soapbox you want to preach from- they're looking for solutions.
We expect we'll be able to refederate as soon as we get an adequate level of granularity in moderation tools to prevent bad actors like this. If you're a developer looking for a good target for what is needed, it's precisely this.
As a minor aside I'm working on another philosophy post about moderating specifically - what I've observed over the years, what I think works well in our vision, what extra work is needed in safe spaces and to prevent evaporative cooling, what I'm almost certain we need to do, and where my blind spots are.
Well.... yes and no. Violence can have both positive and negative effects on a movement, it really depends on what kind of violence, who is committing the violence (racism sexism etc. all come into play here), and what kind of resistance they are met with. Here's two great reviews which outline what the literature has to say on this.