9

cross-posted from: https://sopuli.xyz/post/34381286

I've been having issues with my homelab ever since I set it up a few months ago. For some reason the server becomes unresponsive as if it is online. However when accessing its CLI, it seems to spew out this message in continuity.

I've tried entering commands directly into the CLI, but it shows an 'input/output' error instead. I cannot even get it to shutdown through the CLI so I have to manually pull the plug.

Here's another screenshot of the logs in the CLI a few moments just after the error occurred.

The issue does not even get fixed after I try switching it off and on. Sometimes the homelab gets stuck indefinitely in the startup loading screen, fails to detect the system partition between the GRUB stage, results in a Linux kernel crash or refuses to boot altogether. It is only mitigated when I leave the homelab switched off for 5 minutes or so.

The weird thing about it is that there is no way to predict when this error could come up. The server would work completely unhindered for a few weeks straight on some occasions, and break down just a few minutes after startup. It doesn't depend on what type of services I am hosting, all of which are lightweight in nature.

Additionally, once it does start working again there seems to be no record of the encountered error to be seen in the logs, apart from the number of unsafe shutdowns. This makes it difficult to debug or even document the matter coupled with the fact that its occurence is random in nature. I'be tried running several diagnostic tools including smartctl but I am unable to deduce anything useful out of it.

Some specs and info about the homelab is as follows:

  • Build: Pre built Compact Mini PC
  • CPU: Intel i7-14700
  • RAM: 16GB
  • Storage: 1TB SSD
  • GPU: Integrated Intel HD Graphics 770
  • Operating System: Ubuntu 24.04 LTS

I would really appreciate if you could point out the cause of this issue. This experience makes the server reliable which is why I don't feel comfortable hosting anything valuable or sensitive on it yet. I can provide you additional details or logs if required.

you are viewing a single comment's thread
view the rest of the comments
[-] sylver_dragon@lemmy.world 3 points 1 week ago

With intermittent errors like that, I'd take the following test plan:

  1. Check for disk errors - You already did this with the SMART tools.
  2. Check for memory errors - Boot a USB drive to memtest86 and test.
  3. Check for overheating issues - Thermal paste does wear out, check your logs for overheating warnings.
  4. Power issues - Is the system powered straight from the wall or a surge protector? While it's less of an issue these days, AC power coming from the wall should have a consistent sine wave. If that wave isn't consistent, it can cause a voltage ripple on the DC side of the power supply. This can lead to all kinds of weird fuckery. A good surge protector (or UPS) will usually filter out most of the AC inconsistencies.
  5. Power Supply - Similar to above, if the power supply is having a marginal failure it can cause issues. If you have a spare one, try swapping it out and seeing if the errors continue.
  6. Processor failure - If you have a space processor which will fit the motherboard, you could try swapping that and looking for errors to continue.
  7. Motherboard failure - Same type of thing. If you have a spare, swap and look for errors.

At this point, you'll have tested basically everything and likely found the error. For most errors like this, I've rarely seen it go past the first two tests (drive/RAM failure), with the third (heat) picking up the majority of the rest. Power issues I've only ever seen in old buildings with electrical systems which probably wouldn't pass an inspection. Though, bad power can cause other hardware failures. It's one reason to have a surge protector in line at all times anyway.

this post was submitted on 28 Sep 2025
9 points (100.0% liked)

Linux

13418 readers
20 users here now

Welcome to c/linux!

Welcome to our thriving Linux community! Whether you're a seasoned Linux enthusiast or just starting your journey, we're excited to have you here. Explore, learn, and collaborate with like-minded individuals who share a passion for open-source software and the endless possibilities it offers. Together, let's dive into the world of Linux and embrace the power of freedom, customization, and innovation. Enjoy your stay and feel free to join the vibrant discussions that await you!

Rules:

  1. Stay on topic: Posts and discussions should be related to Linux, open source software, and related technologies.

  2. Be respectful: Treat fellow community members with respect and courtesy.

  3. Quality over quantity: Share informative and thought-provoking content.

  4. No spam or self-promotion: Avoid excessive self-promotion or spamming.

  5. No NSFW adult content

  6. Follow general lemmy guidelines.

founded 2 years ago
MODERATORS