114
submitted 2 months ago by supakaity to c/main

Hi all our lovely users,

Just a quick post to let you all know that along-side the upgrade to 0.19.3, we've also added a couple of alternate UIs to the Blåhaj lemmy for you.

Obviously the default lemmy-UI at https://lemmy.blahaj.zone still exists and has been updated to 0.19.3 alongside the lemmy server update.

There's also now an Alexandrite UI at https://alx.lemmy.blahaj.zone which is a more modern, smoother UI, written in svelte, by sheodox.

And then for those who are nostalgic for reddit days of yore, and memories of when PHP websites last ruled the earth, there's MLMYM (courtesy of rystaf) at https://mlmym.lemmy.blahaj.zone.

Please enjoy, and I hope the upgrades work well for you.

25
¡La mariposa, muy bonita! (lemmy.blahaj.zone)
submitted 3 months ago by supakaity to c/main

Esto es una prueba

46
Testing image upload (lemmy.blahaj.zone)
submitted 5 months ago by supakaity to c/main

Test

[-] supakaity 31 points 7 months ago

Migration has been completed!

116
submitted 7 months ago* (last edited 7 months ago) by supakaity to c/main

We're currently in the process of migrating our pict-rs service (the thing responsible for storing media/images/uploads etc) to the new infrastructure.

This involves an additional step of moving our existing file-based storage to object storage, so this process will take a little time.

New images/uploads may not work properly during this migration, however existing images should continue to load. We expect this migration to take about an hour.

[EDIT]

Migration has completed.

685,271 files / 153.38 GB were migrated. Copying to object storage took about 1.5 hours. Starting service back up on new server and debugging took another 30 minutes.

Timeline:

  • Migration started at 2023-10-01 22:43 UTC.
  • [+1h32m] Objects finished uploading to object storage at 2023-10-02T00:15 UTC.
  • [+2h06m] Migration was completed at 2023-10-02 00:46 UTC.
[-] supakaity 23 points 8 months ago

You are super welcome, lovely.

It brings Ada and I a whole heap of pleasure running these instances and it's largely knowing that we're making a difference to our users, and providing a safe space for you all to grow and flourish that makes it all worth it for us.

blobhaj, hug, tinybla

79
submitted 8 months ago* (last edited 8 months ago) by supakaity to c/main

Blåhaj Lemmy will be down for database migration to the new servers in approximately 1.5 hours from now (06:00 UTC).

Downtime is estimated at under an hour.

I will have more details on the maintenance page during the migration and update the status as the migration progresses.

[-] supakaity 75 points 9 months ago

I have been watching my love tie herself in knots over the last several days, having to deal with the drama that has been brought on, trying her best to bring everyone back together.

There's been bad behaviour from both sides, and I'm really disappointed to see that some of the worst of it came from our users, who didn't keep to the moral high ground, disregarded our instance rules and stoopped to levels of behaviour worse than that leveled against them.

There have been accusations against us (or Ada specifically) that we are a safe harbour for bad behaviour and cause harm to trans people through our inaction.

This is perhaps the cruelest accusation they could have leveled at Ada, as she works tirelessly to maintain a safe space for our community, and while I was hoping, for all the effort that she was investing into this issue, that she could make it work despite my own reservations, this last attack on her impeccable morality has made me very angry.

I'm sorry for those that wanted to remain federated, sorry that it came to this, but I am glad it's over now, purely for the mental health of my precious beloved.

[-] supakaity 25 points 9 months ago

Okay, so that was way more painful than expected... /sigh

59
submitted 9 months ago by supakaity to c/main

The server will be briefly down while we install a new updated version of lemmy and restart it.

The maintenance window is 15 minutes, but should be much shorter.

321
submitted 9 months ago by supakaity to c/main

So it's been a few days, where are we now?

I also thought given the technical inclination of a lot of our users that you all might be somewhat interested in the what, how and why of our decisions here, so I've included a bit of the more techy side of things in my update.

Bandwidth

So one of the big issues we had was the heavy bandwidth caused by a massive amount of downloaded content (not in terms of storage space, but multiple people downloading the same content).

In terms of bandwidth, we were seeing the top 10 single images resulting in around 600GB+ of downloads in a 24 hour period.

This has been resolved by setting up a frontline caching server at pictrs.blahaj.zone, which is sitting on a small, unlimited 400Mbps connection, running a tiny Caddy cache that is reverse proxying to the actual lemmy server and locally caching the images in a file store on its 10TB drive. The nginx in front of lemmy is 301 redirecting internet facing static image requests to the new caching server.

This one step alone is saving over $1,500/month.

Alternate hosting

The second step is to get away from RDS and our current fixed instance hosting to a stand-alone and self-healing infrastructure. This has been what I've been doing over the last few days, setting up the new servers and configuring the new cluster.

We could be doing this cheaper with a lower cost hosting provider and a less resiliant configuration, but I'm pretty risk averse and I'm comfortable that this will be a safe configuration.

I woudn't normally recommend this setup to anyone hosting a small or single user instance, as it's a bit overkill for us at this stage, but in this case, I have decided to spin up a full production grade kubernetes cluster with a stacked etcd inside a dedicated HA control plane.

We have rented two bigger dedicated servers (64GB, 8 CPU, 2TB RAID 1, 1 GBPS bandwidth) to run our 2 databases (main/standby), redis, etc on. Then a the control plane is running on 3 smaller instances (2GB, 2 CPU each).

All up this new infrastructure will cost around $9.20/day ($275/m).

Current infrastructure

The current AWS infrastructure is still running at full spec and (minus the excess bandwidth charges) is still costing around $50/day ($1500/m).

Migration

Apart from setting up kubernetes, nothing has been migrated yet. This will be next.

The first step will be to get the databases off the AWS infrastucture first, which will be the biggest bang for buck as the RDS is costing around $34/day ($1,000/m)

The second step will be the next biggest machine which is our Hajkey instance at Blåhaj zone, currently costing around $8/day ($240/m).

Then the pictrs installation, and lemmy itself.

And finally everything else will come off and we'll shut the AWS account down.

[-] supakaity 55 points 9 months ago

So, one thing I'd mention is the systems and admin work involved in running an instance.

This is on top of the community moderation, and involves networking with other instance admins, maintaining good relations, deciding who to defeferate from, dealing with unhappy users, etc.

Then there's the setup and maintenance of the servers, security, hacks, DDoSing, backups, redundancy, monitoring, downtime, diagnosis, fixing performance issues, patching, coding, upgrades etc.

I wouldn't be here doing this without @ada. We make a formidable team, and without any self effacement, we are both at the top of our respective roles with decades of experience.

Big communities also magnify the amount of work involved. We're almost at the point where we are starting to consider getting additional people involved.

Moreover we're both here for the long haul, with the willingness and ability to personally cover the shortfall in hosting costs.

I'm not trying to convince you to stay here. But in addition to free hardware, you're going to need a small staff to do these things for you, so my advice is to work out if you have reliable AND trustworthy people (because these people will have access to confidential user data) who are committed to do this work long term with you. Where will you be in 3 years, 5, 10?

[-] supakaity 92 points 9 months ago

To be clear, $3k is an accurate, but unacceptable amount.

As in that's what it's actually costing us, but it's not what it should be costing. I'd imagine more like $250 is what we should be paying if I wasn't using AWS in the silly way I am.

I'm admitting up front that I've been more focused on developing rather than optimising operating costs because I could afford to be a little frivolous with the cost in exchange for not having to worry about doing server stuff.

Even when the Reddit thing happened I was wilfully ignoring it, trying to solve the scaling issues instead of focusing on the increased costs.

And so I didn't notice when Lemmy was pushing a terabyte of data out of the ELB a day. And that's what got me.

About half that $3k is just data transfer costs.

Anyhow the notice was just to let our users know what is going on and that there'll be some maintenance windows in their future so it doesn't surprise anyone.

We have a plan and it will all work out.

Don't panic or have any kneejerk reactions, it's just an FYI.

[-] supakaity 62 points 9 months ago

Just want to say, I don't blame anyone else but myself.

I certainly don't blame anyone at 196.

I hope I'm really clear about that. It's one of the reasons I specifically didn't name 196 in my announcement.

We've got a solution planned, we've already started to implement it and have the image transfer issue solved already.

We can afford to cover this ridiculous AWS bill, I just need to do some maintenance work so this doesn't continue because I can't continue to line Jeff Bezos' pockets like this indefinitely.

158
submitted 9 months ago by supakaity to c/main

Discussion of the current situation with the Blåhaj instances, and upcoming maintenance.

99
submitted 10 months ago by supakaity to c/main

Our lemmy is now running the 0.18.2 release version, which should fix some lingering issues we've been having.

Let @ada or myself know if there's any issues!

[-] supakaity 29 points 10 months ago

Migration complete.

64
submitted 10 months ago by supakaity to c/main

Hi everyone, I'll begin migrating the lemmy blåhaj database to the new server this morning in about 20 minutes.

Expected duration is about 1 hour for this migration.

There will be a maintenance page up during the migration and I will be updating the status as we go to keep you updated on the process.

Later today I'll also be upgrading the software to the latest release as well.

[-] supakaity 28 points 10 months ago

It wasn't an actual emojo. The script processed the SQL header column names as an emojo and tried to add them. Unfortunately publicUrl is not a valid URL, so lemmy's /api/v3/site metadata endpoint started returning an error relative URL without a base instead of the JSON that the website was expecting and so the site just stopped working for everyone the next time it tried to load that url.

153
submitted 10 months ago by supakaity to c/main

I have a process which copies emojos from the hajkey blahaj.zone site to our lemmy.

Last night (for me) it copied an invalid emojo into the database and it broke lemmy for hours while Ada and I were sleeping.

I've found the issue and fixed it now, so everything should be purring along once again, but for those who were left without the instance to use, I sincerely apologise.

blobcat, sorry

[-] supakaity 30 points 10 months ago* (last edited 10 months ago)

How do you know I haven't always been the hacker who's in control? :D

281
submitted 10 months ago* (last edited 10 months ago) by supakaity to c/main

Hi everyone.

Lemmy had an XSS vulnerability and we were one of the instances attacked through this vulnerability. We had our homepage replaced by a youtube video.

The lemmy devs were quick to create a PR for this issue and we have patched our instance to fix it,

Also while I do not believe any login tokens were compromised, we have taken the additional measure of rotating our secret key, so everyone will have to login to the site again as your existing credentials will no longer be valid.

If login fails to work properly, you may have to clear your cookies for the site... it seems to get confused about whether you're logged in or not.

Sorry that this happened, if you have any questions please contact @ada@lemmy.blahaj.zone or myself.

blobcat, heart

[-] supakaity 22 points 10 months ago

Database upsize completed. Still working on 0.18.1 upgrade, will be one further (software) restart in about an hour and then we'll be done and running the latest release candidate.

[-] supakaity 23 points 10 months ago

Okay, I'm taking down the server now to upsize thew instance hardware.

view more: next ›

supakaity

joined 1 year ago
MODERATOR OF