How to sort a list of communities by engagement? (programming.dev)

submitted 1 week ago* (last edited 1 week ago) by CoderSupreme@programming.dev to c/python@programming.dev

4 comments fedilink hide all child comments

I have a list of communities, each with total votes, upvote percentage, and the community name. I want to sort the list by 'engagement,' which would be some combination of total votes and upvote percentage. What is the best way to do this? What would be the best measure of 'engagement' with each community given this data?

top 4 comments

sorted by: hot top controversial new old

[-] Dunstabzugshaubitze@feddit.org 1 points 1 week ago

https://docs.python.org/3/library/stdtypes.html#list.sort

supply a function as key that produces a numerical value for whatever you define as "engagement".

[-] gigachad@piefed.social 1 points 1 week ago

If you have multiple values to sort on I suggest using pandas, building a pandas.DataFrame and sort it using the .sort_values() method, which allows you to provide multiple sorting keys. Your data sounds to complex for lists, it is easy to lose the overview. You could work with dictionaries, but then you need to write some ugly loops or comprehension.

[-] Auster@thebrainbin.org 1 points 1 week ago

On the sorting logic, not the code itself, maybe calculate differently for each range of total votes?

For example, let's say there are 5 communities with up to 100 total votes, 5 with 1000 and 5 with 10000. You could, for the first, divide the percentage by some constant like let's say 10, the middle one you'd do nothing, and the third one you'd multiply the percentage by the same constant as the first. The resulting number (no longer a percentage) could indicate the engagement.

[-] it_depends_man@lemmy.world 1 points 1 week ago

You want some kind of decay function for when that engagement happened.

The rest is sort of up to you and depends on your math intuition a bit. If you do something like total (votes/10.000)+% relative stuff will weigh heavily until you get close to 10.000 then the votes will dominate no matter how positive the post was. But the 10.000 is arbitrary.

My advice would be to create some fake data that are plausible scenarios, (well liked, low vote), (lots of votes, medium %), (lots of votes, but old) and then you experiment with some functions and curves until you find a mix you like.