My mistake then. I have to give Voyager another try, I guess :)
From what I can tell, the larger instances have frontpages without much user duplication. They have a LOT more memes than I do tho :)
I'll take a look at Superbowl@lemmy.world!
The political posts thing is interesting. So far, mainly news posts with text descriptions remain on the front page (which I personally prefer over the memes, but that's just my preference). I am having a hard time deciding if I actually find them interesting though.. ^^ I think I actually prefer it over my reddit homepage, there tends to be much of the same, usually.
For comparison, see: https://imgur.com/a/xbzMXmQ
I made no changes to the lemmy codebase, its all done through an auto-moderating bot that auto-removes posts that don't meet the standard :)
Yeah, that's one of the potential issues that I'm currently looking out for. So far the main thing I can tell is that memes get removed like crazy (https://lemmy.coffee/c/memes@lemmy.world?dataType=Post&sort=New) and the posts on the homepage are generally much less meme-intensive when compared to instances like lemmy.world or lemmy.ml.
That's the idea, yes.
Do you plan to publish your algorithm/filter?
In an ideal world sure. But I'd have to think about that some more, because in principle I don't want people to game it :)
I added the larger communities before starting to remove posts, so there may be historical posts still hanging around. Maybe everything from BuyFromEU was deleted?
You can see the kind of stuff that stays best via homepage ALL > Top last N hours
I do very few things explicitly, I just punish self-similarity in a very specific way. I guess posts with actual text in the body are just more unique, given all previous posts on the instance.
Maybe using the filtered posts as a base in combination with some client side keyword blocking will be useful? The keyword blocking would be much more individual for each user.