Recently there's been quite a bit of outrage because the developer of Piefed publicly called out the Fediverse Anarchist Flotilla (FAF) for supposedly using LLM for automating instance moderation. and even though many of our admins the larger lemmy community took great lengths to debunk that post, it has become the disinfo that keeps on giving (see https://lemmy.dbzer0.com/post/68749575, https://kolektiva.social/@ophiocephalic/116518887925988112, https://lemmy.dbzer0.com/post/68222242 and more)
After clarifying our position for yet another time, someone suggested we should make an official post and an instance policy to "give me something I can boost as a positive example and a sign that things will be better going forward." and given that this storm-in-a-teacup doesn't seem to be abating as people are all too happy to bring it up again and again to malign the FAF; We're making this post to once and for all clarify this situation.
History
We're not going to rehash the whole drama and the many hit pieces against the FAF in the past two weeks, but I need to post the exact situation as it happened, without speculations and assumptions that people are all too happy to jump to.
- One of our mods develops a tool to download a user's public posting history through the lemmy API, to be used for evaluating them during moderation and shares it with some people in the admin team as something in progress. This tool does not feed anything to LLMs, it simply downloads the comments locally in a text file for easier review than going via the lemmy GUI.
- Someone is reported to our instance admins for blatant zionism and genocide apologia.
- An admin uses the tool to download the accused person's comment history for evaluation
- A quick evaluation (without LLM) confirms that this is a person that needs to be instance-banned. The moderation decision has now been locked-in at this point.
- At the same time, that admin was curious to discover if LLMs can used to summarize people's positions so that people can quickly follow-up with mod actions, without having to evaluate everyone's posts manually and reduce the workload of admins writing long justifications)
- As an experiment, the admin pass the user's comment history through a locally-run open-weights LLM (Qwen) to see the summarized output. It happens to match their own decision.
- The admin decides the leave the LLM summary in a pastebin along with that user's posting history for reference. As an inside joke, they decide to claim the post was summarized by OpenAI, as they expected only our community would care about this and our stance on corporate-LLMs is well-known at this point.
- The admin bans that person, providing a link to that pastebin as justification.
- The admin decides not to continue using LLMs anyway for summaries, for many valid reasons. As evidence see the lack of other pastebins with LLM summaries.
~2 weeks pass...
- The piefed developer is banned by a different mod in our instance for "zionism". (I put this in quotes as this is one mod's opinion, and not necessarily our instance's position.)
- The piefed developer apparently starts going through our instance modlogs for banned zionists and parses all their justifications
- The piefed developer discovers that modlog justification from 2 weeks before with the LLM summary.
- The piefed developer ask quickly in the common lemmy admin channel about it, at which point our instance admin in question, clarifies that the LLM was not used in the decision-making.
- The piefed developer does not officially reach to anyone else from our admin team, despite the fact that we've reached out before and asked them to contact us in advance for inter-instance matters to avoid escalations.
- The piefed developer make the public call-out I linked above as a piece of investigative journalism. The piefed developer does not provide the comments from our team which conflict with their narrative. The piefed developer not ask us for an official statement.
- The piefed developer to this day has not amended their public call-out from the comments multiple of our admins and lemmy users leave under their post, conflicting with the narrative.
If you feel I've misrepresented any steps of this history, please let us know and I'll be happy to adjust.
Given that, we acknowledge that even though we didn't use LLMs in moderations, we allowed it to appear as if we did, and that's on us. We will of course not do the same mistake again (appear as to be using LLMs for moderation)
The FAF's stance on LLM moderation
We are aware that our instance is seen as "LLM-friendly" due to our nuanced take on LLMs but that does not mean that we, as an instance, ever considered using LLMs for moderating our instance. So we want to make it absolutely crystal clear how we stand on the matter.
As an official policy:
- We have never used LLMs to guide our moderation decisions. This includes using LLM summaries which we would then validate, as well as LLM summaries which we use to confirm our existing decisions. LLMs are just not in our moderation loop whatsoever.
- We have never passed instance data to corporate LLMs.
- We have not used any automated moderation tooling which utilizes LLMs. The closest we have is the FOSS anti-CSAM filter I've developed and shared for years now, which relies strictly on locally-hosted machine-vision models.
- We have never officially considered using LLMs for moderation, nor do we plan to.
- As a team we're steadfastly against LLM for moderation due to its inherent biases.
- If any of the above changes, we will publicly inform the FAF community.
We hope this can finally put this matter to rest.
Believe it or not this does actually exist in a sense. There is such a thing as model surgery where parts of models are removed, bits from multiple models recombined, or model layers duplicated. Sometimes this is used to make an LLM with more performance or less resource usage. Models can then be "healed" by continued training so they behave correctly after surgery.
If you want a hardware and systems level example look no further than data center level redundancy and network routing systems that adapt from failures.
Epigenetics I am not sure would have an equivalent given we are not talking about biological creatures.
Have you never heard of AI drugs using adversarial examples or activation engineering?
Activation engineering has been used in studies like this to manipulate emotion concepts in LLMs: https://www.anthropic.com/research/emotion-concepts-function Here it's used for steering LLMs: https://arxiv.org/abs/2308.10248
It's also used to uncensor LLMs.
I am not going to sit here and get into an argument around if LLMs are or aren't sentient because we don't know enough about consciousness to make that determination. We are still a long way away from solving the philosophical hard problem of consciousness. We don't really know what parts of animals or humans are necessary components of sentience or are just implementation specific details made by evolution. I also think that you are judging an LLM based on poor understanding of how they work and the ecosystem built up around them.
Well, ok, sure, if you can frankenstein together different 'brain lobes' from different LLMs, graft them together, and then run through some process that makes them function as a more cohesive whole... yeah!
I can see that as a kind of neuroplasticity... sort of...
Its sort of like a very extreme version of setting up a group of adversarial agents that basically try to bullshit check each other, but you're doing it on another level, internally, actually ending up with a kind of hybrid model at the end.
But its also not like neuroplasticity, in that... nothing like neuroplasticity is present within a single LLM. Whereas in brains... every neuron, every constituent part has a 'mind of its own' in a sense, at a cellular level, from its genetic code and the rules of chemsitry that govern how a single one of them works, and allows for or gives rise to emergent structures and complexity, where the ability of brain to significantly restructure itself is one of those emergent complex properties/phenomena.
But but, I can also see how the ability to have networked and failover redundant hardware does at least somewhat approach a more biological neuroplasticity.
Maybe an interesting experiment would be to try and set up a network of LLMs that are actually designed to mimic the various different sections of the human brain, with their differing areas of relative 'expertise', their tendencies to handle different kinds of 'computational loads' ... and then link them together to form a whole total human brain analog... perhaps that could yield interesting results.
Now, when you talk about activation engineering... and then link the idea of basically talking to them in a certain way... that to me comes across more like gaslighting, social conditioning, brainwashing, etc.
What I was trying to get at was the functional neurochemical mechanisms at play, that moderate how a biological experiences reality, makes decisions and observations... that the complexity and subtlty of the electrochemical soup and all the interaction rules of physics and chemistry at play there... that is significantly different from the binary fit flipping that is the base level of how an LLM works.
But yes I am aware that LLMs have emotional vectors that can be tracked, measured, activated, manipulated via repeated patterns of external stimuli.
I've done a crude version of tinkering with or exploring this myself, with my local LLM setup.
But that page itself says that these are emulations, approximations of human emotions.
I was trying get at the actual distinct foundation level mechanisms that give rise to things like emotions, that are different between brains and LLMs.
Yeah you aren't quite getting what I am trying to explain here.
You used the example of Pheneas Gage as an example of neuroplasticity and neural remapping in the event of extreme damage. By talking about model surgery I was trying to make the point that LLM training algorithms do the same sort of thing. They manipulate each individual parameter of the model in a similar way to how neuroplasticity manipulates synapses in humans. It goes below the level of neurons onto the level of individual connections and biases affecting those neurons.
As for different models handling different things while this is actually done in practice we also have the concept of mixture of expert models. It's worth looking into.