Developer / Potential Contributor Question: how to add a custom post/comment ranking algorithm to Lemmy?
How would I add a new ranking algorithm to Lemmy as a contributor? I'm a developer by trade, but unfamiliar with Rust and the codebase of Lemmy specifically. It doesn't seem like Lemmy has a concept of 'ranking plugins', so whatever I do would have to involve an MR.
Specifically, I'd like to introduce a ranking system that approximates Proportional Approval Voting, specifically using Thiele's elimination methods, like is used in LiquidFeedback.
I'm pretty sure that with a few tweaks to Thiele's rules, I can compute a complete ranking of all comments in a thread in O(ClogC + E + VlogC), where C is the number of comments, E is the total number of likes, and V is the number of users. This would also support partial approvals, upvotes could decay with age.
I believe this would mitigate the tendency towards echo chambers that Lemmy inherits from Reddit. Lemmy effectively uses Block Approval Voting with decays to rank comments and posts, leading to the same people dominating every conversation.
The ranking is implemented in a rather complicated way for performance reasons:
post.hot_rank)This is exactly the info I'm looking for, thanks! I knew there'd have to be some kind of scheduled task to recompute the rankings (IIRC the lemmy docs say ~10 minutes for recomputing ranks), I just wasn't sure where it was.
The change that would require the least effort to implement my voting system (whether the lemmy maintainers would accept the change notwithstanding) would be to target the schedule task, and introduce a server-wide configuration option that would allow admins to pick whether they're using Block Approval (what we have now) or Proportional Approval (what I'm introducing) based algorithms for their server's "hot" algorithm. No API or frontent changes required. Then, work towards community mods being able to set that on a per-community basis.
Something for me to experiment with, anyway.
Sounds interesting, though from your links it's not clear to me how exactly it works. Depending on that it could make more sense to implement as a separate sort option, then each user could try and compare it to the existing sorts.
I was thinking of it as a drop-in replacement for "hot" just so that it doesn't require any changes on the UI to implement. I'm a bit rusty with UI development, lol. The frontends wouldn't have to add a new button, and the Lemmy API wouldn't need to add a new sort type. That said, maybe that sort of thing is easy to do?
As far as it would work, Thiele's elimination rules is computed roughly as follows (I'm assuming that only upvotes are counted; I haven't considered yet if the process works if disapprovals count as a vote of "-1" or how the process could remain scalable if an abstention counts as a vote of "0.5":
For this algorithm, the
yield the removed poststatement will return the sorted posts in reverse order. So worst to best. You could also interpret that statement as "Give the post a rank in the final sorting ofcount(posts) - (i++)".Thiele says that process can be used to elect a committee of size N by stopping your removal when N votes remain. But because it's a "house monotonic" process (electoral speak for "increasing the size of the committee by one and re-running an election is guaranteed not to cost any existing members their seat), I figure it could be repurposed to produce a ranking as well - the top one item is "best one", the top two items are the best two, the top three are the best three, etc.
To make the above process work for approvals that decay over time, we'd just treat a decayed approval as a partial approval. I still have some work to do on how exactly to integrate partial approvals into the "resistance to removing each post" calculations without ruining my time complexity. But basically it's a proportional score voting election instead of proportional approval.
Adding a new sort type is not a big deal, so dont worry about it. And a new admin setting for this would also require UI changes, so the new sort type is easier overall.
The current sort options calculate the rank for each post only from the data on that post (number of votes, creation time). Your suggested algorithm looks much more complicated than that, as it requires two iterations and needs to access data from multiple posts at once. Im not sure if this can really be implemented in a way thats performant enough for production use. Anyway feel free to open a pull request, then hopefully other contributors can help you to get it working.
Lemmy needs less algorithms rather than more. If they're accepting voting-related patches, maybe write one that turns off voting and just shows everythng chronologically. There is actually no way to do that right now. Sort by "new" orders the top level comments in a thread most recent first, but keeps the threading intact. There should be a setting that flattens out everything and sorts purely by timestamp.
The "Chat" option does just that, with newest comments at the top.
Then the conversation would be dominated by whoever has the most time to make posts, which isn't what I'm going for. I'd rather have a method that satisfies Proportional Representation.
Have you considered taking the approach from https://phanpy.social/, and let the sorting algorithms on the client side?
Not only would make your work independent from Lemmy, it would give you complete freedom to choose how to implement this.
I don't think it would work for my specific algorithm, unfortunately. To compute PAV, I need access to the "raw votes" of each individual user.
PAV doesn't need to know the identity of the user behind each upvote, but it does need to be able to correlate which upvotes originated from the same user so that once a user is determined to be "satisfied" by having a comment they upvoted given a high rank, all of their other upvotes need to be deweighted for the remainder of the process to "make room" for other users' opinions.
I checked the Lemmy API docs, and while that information is available at
/api/v4/post/like/listand/api/v4/comment/like/list, so I could have a frontend that scraped every comment and then every like of every comment in a community (ignoring how inefficient that would be), but both of those endpoints are tagged as "Admin-only".Plus, even if I could do that, to compute the rankings my process does need the upvotes of every comment in a post (or every post in a community) before it knows which one is on top, which seems too much to put on a client.
Or you could do the same thing that https://lemvotes.org/ does and follow the communities and actors to build this database on a separate server, which then can be used by the client(s).
Huh. TIL!
We don't have nearly enough comments for this to be a real problem
What if the lack of comments were because comments weren't proportionally representative?
Someone sees a discussion that interests them, so they see what the top comments are. But if the Hive Mind(tm) has spoken (even if just by awarding the top two or three comments to the same viewpoint), will they engage, or will they go somewhere else?
Remove the Hive Mind, and maybe you'll get more engagement?