413

Instance Admins: Check Your Instance for Vote Manipulation Accounts [PSA] (dubvee.org)

submitted 1 month ago* (last edited 1 month ago) by ptz@dubvee.org to c/fediverse@lemmy.world

131 comments fedilink hide all child comments

Over the past 5-6 months, I've been noticing a lot of new accounts spinning up that look like this format:

https://instance.xyz/u/gmbpjtmt
https://instance.xyz/u/tjrwwiif
https://instance.xyz/u/xzowaikv

What are they doing?

They're boosting and/or downvoting mostly, if not exclusively, US news and politics posts/comments to fit their agenda.

Edit: Could also be manipulating other regional news/politics, but my instance is regional and doesn't subscribe to those which limits my visibility into the overall manipulation patterns.

What do these have in common?

Most are on instances that have signups without applications (I'm guessing the few that are on instances with applications may be from before those were enabled since those are several months old, but just a guess; they could have easily just applied and been approved.)
Most are random 8-character usernames (occasionally 7 or 9 characters)
Most have a common set of users they're upvoting and/or downvoting consistently
No posts/comments
No avatar or bio (that's pretty common in general, but combine it with the other common attributes)
Update: Have had several anonymous reports (thanks!) that these users are registering with an @sharklasers.com email address which is a throwaway email service.

What can you, as an instance admin, do?

Keep an eye on new registrations to your instance. If you see any that fit this pattern, pick a few (and a few off this list) and see if they're voting along the same lines. You can also look in the login_token table to see if there is IP address overlap with other users on your instance and/or any other of these kinds of accounts.

You can also check the local_user table to see if the email addresses are from the same provider (not a guaranteed way to match them, but it can be a clue) or if they're they same email address using plus-addressing (e.g. user+whatever@email.xyz, user+whatever2@emai.xyz, etc).

Why are they doing this?

Your guess is as good as mine, but US elections are in a few months, and I highly suspect some kind of interference campaign based on the volume of these that are being spun up and the content that's being manipulated. That, or someone, possibly even a ghost or an alien life form, really wants the impression of public opinion being on their side. Just because I don't know exactly why doesn't mean that something fishy isn't happening that other admins should be aware of.

Who are the known culprits?

These are ones fitting that pattern which have been identified. There are certainly more, but these have been positively identified. Some were omitted since they were more garden-variety "to win an argument" style manipulation.

These all seem to be part of a campaign. This list is by no means comprehensive, and if there are any false positives, I do apologize. I've tried to separate out the "garden variety" type from the ones suspected of being part of a campaign, but may have missed some.

[New: 9/18/2024]: https://thelemmy.club/u/fxgwxqdr
[New: 9/18/2024]: https://discuss.online/u/nyubznrw
[New: 9/18/2024]: https://thelemmy.club/u/ththygij
[New: 9/18/2024]: https://ttrpg.network/u/umwagkpn
[New: 9/18/2024]: https://lemdro.id/u/dybyzgnn
[New: 9/18/2024]: https://lemmy.cafe/u/evtmowdq
https://leminal.space/u/mpiaaqzq
https://lemy.lol/u/ihuklfle
https://lemy.lol/u/iltxlmlr
https://lemy.lol/u/szxabejt
https://lemy.lol/u/woyjtear
https://lemy.lol/u/jikuwwrq
https://lemy.lol/u/matkalla
https://lemmy.ca/u/vlnligvx
https://ttrpg.network/u/kmjsxpie
https://lemmings.world/u/ueosqnhy
https://lemmings.world/u/mx_myxlplyx
https://startrek.website/u/girlbpzj
https://startrek.website/u/iorxkrdu
https://lemy.lol/u/tjrwwiif
https://lemy.lol/u/gmbpjtmt
https://thelemmy.club/u/avlnfqko
https://lemmy.today/u/blmpaxlm
https://lemy.lol/u/xhivhquf
https://sh.itjust.works/u/ntiytakd
https://jlai.lu/u/rpxhldtm
https://sh.itjust.works/u/ynvzpcbn
https://lazysoci.al/u/sksgvypn
https://lemy.lol/u/xzowaikv
https://lemy.lol/u/yecwilqu
https://lemy.lol/u/hwbjkxly
https://lemy.lol/u/kafbmgsy
https://discuss.online/u/tcjqmgzd
https://thelemmy.club/u/vcnzovqk
https://lemy.lol/u/gqvnyvvz
https://lazysoci.al/u/shcimfi
https://lemy.lol/u/u0hc7r
https://startrek.website/u/uoisqaru
https://jlai.lu/u/dtxiuwdx
https://discuss.online/u/oxwquohe
https://thelemmy.club/u/iicnhcqx
https://lemmings.world/u/uzinumke
https://startrek.website/u/evuorban
https://thelemmy.club/u/dswaxohe
https://lemdro.id/u/efkntptt
https://lemy.lol/u/ozgaolvw
https://lemy.lol/u/knylgpdv
https://discuss.online/u/omnajmxc
https://lemmy.cafe/u/iankglbrdurvstw
https://lemmy.ca/u/awuochoj
https://leminal.space/u/tjrwwiif
https://lemy.lol/u/basjcgsz
https://lemy.lol/u/smkkzswd
https://lazysoci.al/u/qokpsqnw
https://lemy.lol/u/ncvahblj
https://ttrpg.network/u/hputoioz
https://lazysoci.al/u/lghikcpj
https://lemmy.ca/u/xnjaqbzs
https://lemy.lol/u/yonkz

Edit: If you see anyone from your instance on here, please please please verify before taking any action. I'm only able to cross-check these against the content my instance is aware of.

you are viewing a single comment's thread
view the rest of the comments

[-] ABasilPlant@lemmy.world 50 points 1 month ago* (last edited 1 month ago)

My bachelor's thesis was about comment amplifying/deamplifying on reddit using Graph Neural Networks (PyTorch-Geometric).

Essentially: there used to be commenters who would constantly agree / disagree with a particular sentiment, and these would be used to amplify / deamplify opinions, respectively. Using a set of metrics [1], I fed it into a Graph Neural Network (GNN) and it produced reasonably well results back in the day. Since Pytorch-Geomteric has been out, there's been numerous advancements to GNN research as a whole, and I suspect it would be significantly more developed now.

Since upvotes are known to the instance administrator (for brevity, not getting into the fediverse aspect of this), and since their email addresses are known too, I believe that these two pieces of information can be accounted for in order to detect patterns. This would lead to much better results.

In the beginning, such a solution needs to look for patterns first and these patterns need to be flagged as true (bots) or false (users) by the instance administrator - maybe 200 manual flaggings. Afterwards, the GNN could possibly decide to act based on confidence of previous pattern matching.

This may be an interesting bachelor's / master's thesis (or a side project in general) for anyone looking for one. Of course, there's a lot of nuances I've missed. Plus, I haven't kept up with GNNs in a very long time, so that should be accounted for too.

Edit: perhaps IP addresses could be used too? That's one way reddit would detect vote manipulation.

[1] account age, comment time, comment time difference with parent comment, sentiment agreement/disgareement with parent commenters, number of child comments after an hour, post karma, comment karma, number of comments, number of subreddits participated in, number of posts, and more I can't remember.

[-] ptz@dubvee.org 7 points 1 month ago

That would definitely work for rooting out ones local to an instance, but not cross-instance. For example, none of these were local to my instance, so I don't have email or IP data for those and had to identify them based on activity patterns.

I worked with another instance admin who did have one of these on their instance, and they confirmed IP and email provider overlap of those accounts as well as a local alt of an active user on another instance. Unfortunately, there is no way to prove that the alt on that instance actually belongs to the "main" alt on another instance. Due to privacy policy conflicts, they couldn't share the actual IP/email values but could confirm that there was overlap among the suspect accounts.

Admins could share IP and email info and compare, but each instance has its own privacy policy which may or may not allow for that (even for moderation purposes). I'm throwing some ideas around with other admins to find a way to share that info that doesn't violate the privacy of any instances' users. My current thought was to share a hash of the IP address, IP subnet, email address, and email provider. That way those hashes could be compared without revealing the actual values. The only hiccup with that is that it would be incredibly easy to generate a rainbow table of all IPv4 addresses to de-anonymize the IP hashes, so I'm back to square one lol.

[-] ABasilPlant@lemmy.world 4 points 1 month ago* (last edited 1 month ago)

Yes, this would essentially be a detecting mechanism for local instances. However, a network trained on all available federated data could still yield favorable results. You may just end up not needing IP Addresses and emails. Just upvotes / downvotes across a set of existing comments would even help.

The important point is figuring out all possible data you can extract and feed it to a "ML" black box. The black box can deal with things by itself.