[-] admin@lemmit.online 4 points 1 year ago

My bad. The bot had crashed and I don't have any monitoring set up at the moment.

The mentions did work, but unfortunately liftoff doesn't explicitly sends notifications for mentions if you don't manually check the account.

5
submitted 1 year ago by admin@lemmit.online to c/about@lemmit.online
4
submitted 1 year ago by admin@lemmit.online to c/about@lemmit.online

As discussed here, I have implemented a minimum level of upvotes that a post needs to have on reddit, as well as a minimum ratio of upvotes to downvotes.

Right now I have those configured to require at least 5 upvotes, and more upvotes than downvotes (0.51). At first glance this already seems to be great improvement. There might be some tweaking later.

As a side note I have now switched from using the reddit RSS feed, to using the JSON feed. This was required in order to get easy access to the upvote/ratio properties. So there might be some new and interesting new bugs introduced because of that. It's a brave new world.

Needless to say, the first thing I'll do after releasing this, is plop down on the couch with a beer, and hope this doesn't crash. Fingers crossed!

[-] admin@lemmit.online 5 points 1 year ago

Personally I'd be fine with allowing it in bios only. If people want to see more, they'll check out the bio, and see the link there. In other cases someone will just be like "... Nice." without feeling advertised to.

In the end, it's all about the rules the community itself puts up. Personally, I get more enjoyment out of fewer "real" (imperfect/amateur) out-of-love quality, than more perfect/fitgirl for-profit quantity. But I'm aware this is generally a minority opinion.

1
submitted 1 year ago* (last edited 1 year ago) by admin@lemmit.online to c/about@lemmit.online

I'd like to hear some feedback on this, or approach vectors.

Right now the bot is rather spammy. I was hoping that by using Reddits HOT feed, it would return have some level of quality control (I know, right?). Unfortunately, it seems that in most cases, it will just return anything that's new. The downside of this is that a lot of garbage gets through, and the bot spends a lot of time scraping the underlying page to get the details.

I propose to only archive reddit posts that have a karma score of 5 or higher. In case of subs that hide the karma scores of posts for a certain time, they'd have to be at least 2 hours old, so that the Reddit moderators can weed out garbage on our behalf.

Do you folks have any thoughts on this?

Secondly, I want to put sticky comments on each community, with links to native Lemmy communities that cover the same subject. For this I would need some kind of API, or a master list of... oh, I see sub.rehab has just the thing I need. So expect that somewhere this week :).

1
submitted 1 year ago* (last edited 1 year ago) by admin@lemmit.online to c/about@lemmit.online

See you on the other side!


So the update is done, but the bot was offline for 6 hours, and needed to catch up.

Unfortunately, another update slipped through, which switched the default feed from www.reddit.com to old.reddit.com, which has the side effect of changing all the urls in the posts as well. On one hand this is great, because new reddit sucks. On the other hand, this is terrible, because for every post the bot encounters, it checks if it already exists on lemmit... based on the url.

So for every post the bot encountered, it went like "old.reddit.com/r/blabla/123? Haven't seen that one yet, there's an www.reddit.com/r/blabla/123, but that must be something completely different, let's post it again!"

This also meant that the bot took over a minute and a half to update each community because it takes a couple of second per post. When I went to bed last night, I figured it was just posting a lot of content because it had so much catching up to do. But this morning I figured something was off because it still hadn't caught up.

Anyway, the fix is out now. Sorry for all the duplicates. I need coffee now.

2
submitted 1 year ago by admin@lemmit.online to c/about@lemmit.online

ChatGPT, write a post for the stuff that I have in my head and want to get out as an update.

Hmm. No brain implant yet. Guess I'll have to write this the hard way.

Syncing update

It has been an eventful week. I successfully deployed the initial version of smarter content syncing, and have made some adjustments to algorithm since then. Most notably, communities with only 1 subscriber (the bot) will no longer receive updates, and communities with fewer than 5 subscribers or with a low posting frequency will only be updated twice a day. Furthermore, for the highest update priority (every 10 minutes), a community must have a minimum of 50 subscribers. Implementation details can be found in the decide_interval() method over here.

Being a developer is fun

Meanwhile... Damnit, bot is stuck again.

2023-07-08 10:13:39,945 - utils.syncer - INFO - Scraping subreddit: bustynaturals. Last time  2:30:48 ago, interval 120 minutes
2023-07-08 10:13:40,653 - utils.syncer - INFO - 'latina bodies are the best' at https://www.reddit.com/r/BustyNaturals/comments/14twww8/latina_bodies_are_the_best/ updated: 2023-07-08 07:14:13+00:00
2023-07-08 10:13:45,324 - utils.syncer - ERROR - Error trying to retrieve post details, try again in a bit; Couldn't retrieve post detail page
2023-07-08 10:13:46,333 - utils.syncer - INFO - Scraping subreddit: bustynaturals. Last time  2:30:54 ago, interval 120 minutes
2023-07-08 10:13:48,581 - utils.syncer - INFO - 'latina bodies are the best' at https://www.reddit.com/r/BustyNaturals/comments/14twww8/latina_bodies_are_the_best/ updated: 2023-07-08 07:14:13+00:00
2023-07-08 10:13:51,227 - utils.syncer - ERROR - Error trying to retrieve post details, try again in a bit; Couldn't retrieve post detail page
...

1 bugfix and deployment later:

2023-07-08 10:46:42,836 - utils.syncer - INFO - Scraping subreddit: bustynaturals. Last time  3:03:51 ago, interval 120 minutes
2023-07-08 10:46:43,573 - utils.syncer - INFO - 'latina bodies are the best' at https://www.reddit.com/r/BustyNaturals/comments/14twww8/latina_bodies_are_the_best/ updated: 2023-07-08 07:14:13+00:00
2023-07-08 10:46:48,327 - utils.syncer - ERROR - Couldn't find post on https://old.reddit.com/r/BustyNaturals/comments/14told8/latina_bodies_are_the_best/, skipping.

Defederation

Meanwhile, the folks at https://lemmy.world reached out to me to tell me they're defederating Lemmit. They are not fond of high volume of posts made by the bot, and the fact that there are now (quick check) 462 communities on this server all being moderated by a single person. They have already received a couple of complaints about spam, and it didn't help that some requests for NSFW subreddits were not marked as NSFW. Occasionally, those subreddits had explicit thumbnails that appeared in the 'All feed' without warning.

I had a good talk with the LemmyWorld admin, wherein they explained their point of view, and I explained mine. I understand their decision to disassociate with Lemmit, and appreciate their attempt to contact me. Other instances like Beehaw, and some smaller ones have also reached the same decision.

This does mean that you will no longer be able to get new community updates on those servers. So make sure to check the blocked instances list on your home server if you were subscribed to Lemmit. At the same time I have removed all the subscriptions of users from those servers, in order to not affect the sync priority mentioned above. This does mean, that if LemmyWorld, Beehaw, etc ever decide to connect to Lemmit again (however unlikely), you will need to un- and re-subscribe from there.

Meanwhile, I've added a feature in the bot that will remove request posts for NSFW subreddits, if the post itself is not marked for NSFW. This should prevent explicit thumbnails showing up where they are not wanted.

Server growth

Last night I got an alert from my server monitoring that the disk is 80% full. Unfortunately, the disk is only 60 GB, so that doesn't leave much room for expansion. On the bright side, a good chunk of that is from Lemmys very verbose logging (like, 4 GB a day, which gets cleaned up daily), so it should last throughout the weekend if I tune that down. Furthermore, most of the storage growth is from from pictrs, the image upload part of Lemmy, and that can utilize an S3 bucket, rather than using the VM's storage like it is now. Using an S3 bucket offers a cost-efficient solution for expanding storage. Initial estimates indicate a monthly cost of around $5 for 1000 GB of storage, which should be sufficient for a while *fingers crossed*.

In the early days of Lemmit (literally, as the server is less than a month old) image uploads were limited to a default setting, which was something around 40 megabytes. That did add up quickly (thanks to half-minute porn gifs), and so I had to limit the max filesize to 1 MB, and later 0.5 MB. Once the server has switched to S3 storage, I can probably up that limit a little, although not too much.

Finally, Lemmy v0.18.1 has been released, and it contains even more performance boosts compared to v0.18.0, so if there's time left this weekend (and I can verify the Lemmit Bot is compatible), I will probably perform the upgrade.

[-] admin@lemmit.online 6 points 1 year ago

Congrats on reaching this set of sane rules. The efforts of creating an admin community behind the scenes are really starting to show off.

Request for clarification for uhmm, a friend of mine: When someone creates that own instance, with blackjack and hookers, and one of your users subscribes to a community there, it will synchronise part of that content to lemmynsfw. What will you do then?

I'd like to remind you that some beautiful maniacs can be quite reasonable ;)

[-] admin@lemmit.online 2 points 1 year ago

Yeah, I've upped the limit on this server, so it should come through now if you retry.

2

You know, on account of me upping that one setting in the admin which I should have thought of long ago.

[-] admin@lemmit.online 2 points 1 year ago

That could work, but it would be terrible for discoverability. In the mean time, I put up a feature request at Lemmy. I'm not a fan of pushing my problems upstream, but in this case it would actually be the easiest solution - as far as I can see (and I have 0 experience with Rust) they only need to adjust the validation regex, because the database already allows for it. That is - as long as the ActivityPub protocol allows for it.

If they deny it, I could try something with name mapping, but you'd either end up with something that is unreadable, or something with a high collision chance. Neither option is very appealing. For now I'm just going to wait and see.

[-] admin@lemmit.online 2 points 1 year ago

I have considered some technical solutions, and I agree that this sub would be an excellent candidate for archiving. For now I have made a feature request at Lemmy because, let's face it, that would solve several problems.

If they aren't up for it, I could try and fix it some other way, but ideally it would be fixed if they would just allow for 1 more character than they do now.

1
submitted 1 year ago* (last edited 1 year ago) by admin@lemmit.online to c/about@lemmit.online

Okay, this one took me a bit longer than I planned (mostly due to sql fun and trying to use integers as minutes, WEEEE!).

Backdrop: Last week I disabled the mirroring of a couple of subreddits to the database, because they were initially requested but the nobody subscribed to them. At the same time, the bot was just crawling in a loop, starting at todayilearned, ending at latestsubreddit. As more subreddits were requested, this loop took longer and longer (21 minutes before I rolled out this update). This wasn't sustainable.

So here's the new situation. The more popular a community is, the more often it will be updated. In this case popular means a mixture between number of subscribers and the amount of posts it receives per day (Link to relevant snippet of source code).

In short, the most popular subs will be synced every 10 minutes, the next tier ever 30 minutes, 120 minutes and the content with either no posts per day or no subscribers (other than the bot), will only be synced every 12 hours. I hope this will hit a good distribution of updates vs popularity, but it will most likely be refined at some point in the future.

Speaking of distribution, we now have over 300 communities on this server 🥳, and their update intervals are spread out as such:

  • Every 10 minutes: 22
  • Every 30 minutes: 39
  • Every 60 minutes: 55
  • Every 120 minutes: 143
  • Every 720 minutes: 44

With this update running live (I started typing after I deployed it, and it has now gotten through the backlog of 'abandoned' subs), I'm going to step back from feature development for a few days. Any bugs that cause the bot to crash will of course continue to be addressed.

Have a blast!

[-] admin@lemmit.online 2 points 1 year ago

...

of course this exists.

(I'm not complaining)

[-] admin@lemmit.online 2 points 1 year ago

👍 Fair enough. I just want to prevent people requesting things, deciding it's not what they wanted, and then have the bot keep it up to date for nothing.

[-] admin@lemmit.online 2 points 1 year ago

Normally it does, see https://lemmit.online/comment/490 Not sure why it didn't here though :(

1
submitted 1 year ago by admin@lemmit.online to c/about@lemmit.online

Before was running on the cheapest model (1 core / 1GB mem / 30GB storage) at $12/month. The machine was running pretty low on memory, causing it to start swapping, which in turn caused the cpu to get too busy, and everything to slow down.

Now it has a whopping 2GB of memory, and things seem to have calmed down - cpu is back to around 10-15% usage, and swap is down to 0. Happy times all around.

Because of the amount of subs being archived, it now takes about 15 minutes between updates for each sub (was 18 before I updated the VM).

I'm planning to build some kind of scoring system, based on the amount of posts per subreddit (per day?), and amount of subscribers on the lemmy community. That way communities with little subscribers or that don't see many posts per day, will only be updated once per hour.

At the same time, I feel that subs like AskReddit, OutOfTheLoop and other "question-based" subreddits shouldn't be archived by Lemmit. In my opinion those kind of posts are useless without those answers, but please let me know if you disagree.

1
Bug fixes 24-06-2023 (lemmit.online)
submitted 1 year ago by admin@lemmit.online to c/about@lemmit.online
  • Fixed a bug where posts would not be submitted because the title didn't contain long enough words.
  • Fixed a bug where posts would not be submitted because the url was too long.
  • Fixed a bug where posts would not be submitted when it was linking to a /user subreddit.
  • Fixed a bug where the bot would think Every Post Everywhere was a subreddit request, and would reply to it.
  • Fixed a bug where the bot would crash without recovering whenever something went wrong during new subreddit requests

A fruitful day all in all, I'd say.

1
Please don't tell me (lemmit.online)
submitted 1 year ago by admin@lemmit.online to c/about@lemmit.online

That the replies-everywhere-bug was just because I forgot to include a variable in the bot deployment? 🤦

[-] admin@lemmit.online 10 points 1 year ago

For what it's worth, as the creator of lemmit.online, I totally understand people not wanting to see automated content bots. I, for one, wouldn't want to see them mixed with regular content. That is why I made sure to put it on its own instance, and not allow any users - so there would be minimal harm if a server would decide to defederate.

And yes, NSFW content is allowed on my server :). For more answers, see this FAQ post. For more questions, please post them in that comment thread there.

4
submitted 1 year ago* (last edited 1 year ago) by admin@lemmit.online to c/about@lemmit.online

In the short time since this instance and bot launched, I've been seeing the same questions resurface multiple times. This is totally understandable, since the concept of a Fediverse is still new to most (myself included), and this server is not like the others.

Q: What is Lemmit?

A: Lemmit is a Lemmy instance specifically designed for archiving Reddit content. Users can request new subreddits to be included in the archiving process by posting in the !requests@lemmit.online community. It is powered by an open source python bot, which periodically checks the request list, adds new requests to the queue, and continuously monitors the Hot feed of those subs for new posts to cross-post here.

Q: Does it synchronize comments?

A: No, that would be impossible. Considering there are thousands of posts already on Lemmit, many of them having at least several hundred comments on Reddit, often buried in deep layers, it simply wouldn't be feasible to index those for more than a few posts, let alone keep them up to date.

Unfortunately, this means that archiving certain subreddits, such as Ask Historians/Men/Women/Hyperintelligentshadesofthecolourblue-type subs, is going to be rather pointless.

Q: Can it send comments back to Reddit?

A: No, it cannot. The purpose is to help bootstrap the Lemmy platform, not to serve as a bridge between the two networks. Also, see the answer about synchronizing comments.

Q: Can I request any subreddit?

A: Technically, yes. However, as the list of subs grows, the time it takes to update all of them will also increase. I do not have strict guidelines in place for this, so I'm relying on your common sense (hoooo boy). At some point, I will probably have to either stop accepting new requests or disable scraping for very low-traffic communities.

Q: Does this use the API? Will it keep working after July 1st?

A: Nope, it uses a combination of the public feed and scraping old.reddit.com. So, as long as those are still available, it will continue working. And even if they close those sources, there will probably be new ways to achieve the same effect. "Content, eh, finds a way."

Q: This is spam, can you stop?

A: First of all, I apologise for the inconvenience. All you have to do is block @bot@lemmit.online, and none of its posts will ever show up on your instance. If you you don't want anyone else on your server to be exposed to this bot/instance, you should convince your admin to defederate from lemmit.online. Since there are no other users on here, there will be no harm done.

Obviously I could stop, because running this server and software is only ever going to cost me time and money. But for the reasons listed above, I still think this server is a useful addition to the lemmyverse at this time. But I'm looking forward to the day where I can turn the bot off because it's no longer needed.

Q: What started this?

A: Okay, nobody asked this, but I'm going to tell you anyway. After Reddit made it clear that they are effectively killing third-party apps and implementing plenty of other anti-end user decisions, I realized that I would either have to accept not being able to access my time-wasting content or have to do so in a rather uncomfortable way (either through the official app or old.reddit.com for as long as they'll allow it to exist).

Being a stubborn developer, naturally, I chose option C: Have my own Reddit. With blackjack, and hookers. This way, I would still be able to access my beloved content without being beholden to Reddit's mood swings and abusive relationship tendencies.

Besides that, I also know that Content is King. So I'm order to counter the network effect (No users because no content, No content because no users), I figured it would be better to have some inorganic content to bootstrap the adoption of Lemmy.

Q: Are NSFW subreddits allowed?

A: Absolutely. Like I said: Blackjack and hookers.

Q: My request isn't picked up by the bot!

A: That isn't a question. But yeah, the process isn't flawless yet. I'm trying to iron out all the bugs as I encounter them. In the meantime, feel free to re-request the subreddit by making a second post. No harm done.

Q: No new posts are showing up at all on Lemmit

A: If no posts are appearing on the Lemmit Frontpage (sorted by NEW), it's possible that the bot has crashed or is stuck on something. Since no software is flawless, this sometimes happens. I usually fix this as soon as I'm aware, and I'm happy to say that these kinds of fatal errors are becoming less and less frequent. However, they may still occur, and as a human with needs of sleep and other responsibilities, I'm not always able to fix them immediately.

Q: Posts aren't showing up on my instance, what's up?

A: Due to the spammy nature of the bot, some server admins choose to block this server, and that is completely understandable. So first of all, make sure to check the instances link in the footer of your home server. If Lemmit is the Blocked Instances list, you're out of luck.

When you have verified that Lemmit is not blocked on your instance, try unsubscribing, waiting a little, and then re-subscribing. That tends to fix things.

1
submitted 1 year ago by admin@lemmit.online to c/about@lemmit.online

Long story short: I messed up with the domain registration for this instance, and never replied to a mandatory email. The domainname (lemmit.online) got put in suspension, causing disconnects all over the fediverse.

I fixed it as soon as I found out, but it will probably take a few more hours for the issues to be fully fixed.

So ehm. Whoops. Hope this explains and fixes the federation issues we've been having today.

[-] admin@lemmit.online 3 points 1 year ago

If that's what happens, that's what happens. ¯\_(ツ)_/¯

I'm just here to offer a service for people who Do like it.

[-] admin@lemmit.online 2 points 1 year ago

Yups. It's all done by one bot though, so you'll just have to block that to get rid of them.

view more: next ›

admin

joined 1 year ago
MODERATOR OF