197
submitted 2 months ago* (last edited 2 months ago) by sunaurus@lemm.ee to c/meta@lemm.ee

Hey folks!

Unfortunately, roughly 2 hours ago, lemm.ee went offline. The cause was our load balancer: it suddenly decided that all of our servers had become unhealthy, despite all health checks responding successfully when I requested them directly. In such cases, the load balancer stops serving all requests, effectively meaning that lemm.ee is unreachable for all users. I am still not sure what exactly caused the issue, but I will try to investigate more over the weekend.

For now, we have partially recovered, and I am continuing to work on remaining issues. Hopefully we will be back to 100% very soon. Sorry for the inconvenience!

top 29 comments
sorted by: hot top controversial new old
[-] UltraGiGaGigantic@lemm.ee 70 points 2 months ago

We appreciate what you do hero

[-] TLGA@lemm.ee 22 points 2 months ago

I was wondering what was going on, status.lemm.ee said the server was ok but the federation was broken. Thank you for fixing it

[-] sunaurus@lemm.ee 19 points 2 months ago

Sorry for the delay in updating the status page - I actually had gone out for lunch just a few minutes before the downtime started, so I didn't even realize anything was up until I was back at my computer about 45 minutes later 💀

[-] ToxicWaste@lemm.ee 6 points 2 months ago

no need to apologise. still a better response time, than some of the professionals I work with ;-)

[-] don@lemm.ee 20 points 2 months ago

I survived the July 18th lemm.ee downtime, and all I got was this lousy comment.

[-] Draegur@lemm.ee 11 points 2 months ago

All is forgiven, thank you for running this lovely instance ^_^

[-] p3e7@lemm.ee 8 points 2 months ago

Thanks for your great work and transperancy!

[-] ramble81@lemm.ee 8 points 2 months ago

Nginx? I had an nginx LB shit itself yesterday. Luckily it auto-recovered and I had HA but just weird it happened.

[-] sunaurus@lemm.ee 8 points 2 months ago

Actually, we're using Hetzner's cloud load balancer for lemm.ee. But if this issue repeats in the near future, then I will definitely consider setting up something else.

[-] db0@lemmy.dbzer0.com 4 points 2 months ago
[-] eleitl@lemm.ee 2 points 2 months ago

It's probably a managed haproxy in Hetzner's case.

[-] Amanduh@lemm.ee 7 points 2 months ago

I'd like to speak to a manager /s

[-] EABOD25@lemm.ee 7 points 2 months ago

Would it be in bad taste to blame Russia?

[-] clot27@lemm.ee 7 points 2 months ago

Sometimes, downtimes are awesome. Get off your machine and spend time with your family, folks!

[-] fossphi@lemm.ee 7 points 2 months ago

Thanks for the quick fix! What did you have to do to get the load balancer working again?

[-] sunaurus@lemm.ee 14 points 2 months ago

For now, I just redeployed all of our servers completely, but as I don't know the actual root cause of the issue yet, I'm still investigating to figure out if anything more is needed.

[-] db0@lemmy.dbzer0.com 6 points 2 months ago

Typically when this happens, the issue is on the LB itself. Maybe its own network had issues?

[-] yournamehere@lemm.ee 4 points 2 months ago

love you guys!

[-] LedgeDrop@lemm.ee 4 points 1 month ago

Seriously, your professionalism in handling the situation and in reporting it is fantastic.

It's totally above and beyond anything we should expect for a service powered by donations!

Thank you!

[-] scytale@lemm.ee 4 points 2 months ago

I thought the entire lemmy network was down because status.lemm.ee was saying our instance was fine and federation wasn't working with every other instance. lol

[-] JimmyBigSausage@lemm.ee 4 points 2 months ago

Thank goodness! Hopefully discovering these vulnerabilities and protecting them will help keep Lemmy alive when the big dogs come in to sweep us away! (Worst fears)

[-] becausechemistry@lemm.ee 4 points 2 months ago
[-] SuperSpaceFan@lemm.ee 2 points 1 month ago

Thank you for keeping us abreast of what's happening. I appreciate you, and how you manage this instance.

[-] tacosanonymous@lemm.ee 2 points 2 months ago

Is there another instance where you could report issues?

If we logged into another account, we’d be able to see those before it comes back up.

[-] sunaurus@lemm.ee 9 points 2 months ago

There are two useful sections on https://status.lemm.ee for this - firstly, there is an automated check for federation with all other instances on the bottom of the page, and everything there being red is a definite sign that something is wrong with lemm.ee itself. Secondly, near the top of that page, I will always write a status message manually when I discover & start work on any issues. This second part can have a bit of a delay, as it requires manual input from myself, but I have updated it every time we had any issues so far.

[-] tacosanonymous@lemm.ee 2 points 2 months ago

That’s good info. Thanks.

[-] EinfachUnersetzlich@lemm.ee 5 points 2 months ago

There's a Discord server in the sidebar that updates are posted in: https://discord.gg/XM9nZwUn9K

[-] tacosanonymous@lemm.ee 10 points 2 months ago

I know. I just don’t want to join a discord.

[-] veniasilente@lemm.ee 1 points 1 month ago

Totally healthy servers have a right to rest every once in a while too. Thanks for keeping us notified!

this post was submitted on 18 Jul 2024
197 points (100.0% liked)

Meta (lemm.ee)

3535 readers
1 users here now

lemm.ee Meta

This is a community for discussion about this particular Lemmy instance.

News and updates about lemm.ee will be posted here, so if that's something that interests you, make sure to subscribe!


Rules:


If you're a Discord user, you can also join our Discord server: https://discord.gg/XM9nZwUn9K

Discord is only a back-up channel, !meta@lemm.ee will always be the main place for lemm.ee communications.


If you need help with anything, please post in !support instead.

founded 1 year ago
MODERATORS