348
"Now I don't wanna" ~The Bug
(lemmy.blahaj.zone)
Welcome to Programmer Humor!
This is a place where you can post jokes, memes, humor, etc. related to programming!
For sharing awful code theres also Programming Horror.
Another angle to try is to set the date one day ahead and see if the bug shows up then. Might need to disconnect from network and set it in the BIOS for the test to work properly.
I could be wrong, but I figure after being off for an hour, all capacitors should have discharged by then, so it's probably not based on how long the hardware has been unpowered.
Though one other angle I just thought of, if you have something that runs periodically, maybe the bug is related to that period being missed once or n times. Or it could be related to something that is meant to wake the computer to run some job and then go back to sleep but instead just sets it in a bad state.
The date/time aspect is an interesting thought. For a bit more context, this machine is a Raspberry Pi connected to several other devices, some via USB and some via a CAN network. The system gets powered on manually, the user performs a task, then shuts it down until they need it again. We only use the date/time for logging. The system is connected to our wifi at our facility but after we ship it then it's likely it will never be connected to the internet except maybe when we're servicing it and updating code. I don't think the Pi has a RTC. I don't really see how the date/time could be causing the issue I'm seeing (seems to be lag in communication with the devices on the CAN network) but I guess stranger things have happened.
Ah that's interesting. If you can swap the devices from one pi to another, try powering it all up on machine A, then swap the devices to machine B and power that on. Might tell you if the issue is with on the pi side or with the devices.
Is latency higher on the first boot than on subsequent ones? I'd be looking into race conditions if you're seeing a bit of lag cascade out into bigger problems. Race conditions are the worst, especially when the race most often goes the right way and just occasionally goes the wrong way. Though you can force the wrong way by adding delays in your code, if you have an idea of where the race is happening.
We have 3 theoretically identical systems here and this same issue occurs on 2 of them. The 3rd one... has bigger issues right now. That would be interesting to see what happens if I swap the Pis around but I'd give it >95% chance the same thing happens.
The important bit is to power one on first before the swap, then you'll have one setup where the pi was recently powered on and another setup where the connected devices were recently powered on. You might see the issue on only one of the devices, at which point you can say if it's the pi being off for a while or the devices that triggers the issue.
Good point. I disabled the internet on both systems so when I come in on Monday hopefully I can confirm whether or not the date/time aspect is a problem. I'll try this as well.
Oh, apologies for my suggestion before seeing this comment hahaha!
CAN devices I have limited experience with, but I know at least in the automotive industry, vehicles often have various CAN devices that have various sleep states. Like, shut car off, it holds brake system for a few minutes and then unlocks the brakes and that ECU shuts down. Later on, an emissions ECU may run a self-diagnostic. After a few days being powered off, the security ECU goes into low power and turns off wireless doorlocks. After the voltage drops too low, the ECU in the head unit ostensibly shuts down, and the next time the car is started, the head unit has to do a cold-reboot and takes a fortnight.
Could be one of those CAN devices takes some time to get into the "off-adjacent" state to manifest the bug?
Largely for safety reasons, anytime the system is turned off power is instantly cut to the entire system. All of our CAN devices boot up much faster than the Pi does. Once the Pi boots, it sets up CAN communication.