153
submitted 4 days ago by danielquinn@lemmy.ca to c/linux@lemmy.ml

From time to time, often after I've restored from sleep or finished playing a Steam game, one of my CPU cores is pinned at 100% with no indication of what might be doing it. Running htop, btop, or GNOME system monitor all show the same thing: CPU0 at 100% while the rest are doing near-nothing, and no process in particular seems to be using those resources.

If I restart, it's back to normal, and sometimes I can play a game in Steam or let the computer go to sleep and it doesn't do this, but it happens often enough that's annoying/confusing so I'd like to know if there's a way to either (a) diagnose which processes are using which CPU cores, or (b) somehow "reset" the checking of these values to make sure that something's not just being misreported.

This is a desktop system running Arch & GNOME.

top 32 comments
sorted by: hot top controversial new old
[-] Tenkard@lemmy.ml 8 points 2 days ago

Another quick tip for htop: the red color in the CPU bar means kernel stuff. In my case it was an issue with interrupts

[-] Zykino@programming.dev 3 points 2 days ago

Where can I learn what each colors means? Is it buried in man htop somewhere? Or in a website?

[-] Tenkard@lemmy.ml 5 points 1 day ago* (last edited 1 day ago)

F1 or h while in htop should show the guide

[-] danielquinn@lemmy.ca 2 points 2 days ago

I had no idea! Thanks for the tip.

[-] piexil@lemmy.world 84 points 4 days ago

Show kernel threads, it's a setting in the htop config menu that is off by default.

[-] danielquinn@lemmy.ca 48 points 4 days ago* (last edited 4 days ago)

There it is! Thank you! It's a process owned by root called kworker/0:0+kacpid. Any idea what that is?

[Edit 1] Interestingly, I can't even kill -9 it.

[Edit 2] With kworker kacpid to work with, I did a quick search and found this SO page that has some interesting information that I only partially understand, but the following worked like a charm:

# grep -Ev "^[ ]*0" /sys/firmware/acpi/interrupts/gpe?? | sort --field-separator=: --key=2 --numeric --reverse | head -1
/sys/firmware/acpi/interrupts/gpe09:11131050     STS enabled      unmasked
# echo disable > /sys/firmware/acpi/interrupts/gpe09

It's not clear to me what an interrupt is or whether this gpe09 value is meant to be persistent across reboots, or why this only seems to be happening in the last couple months, but if I can make it go away by running the above from time to time, I guess it's alright?

[-] scrion@lemmy.world 45 points 4 days ago* (last edited 4 days ago)

An interrupt is an input that can be triggered to interrupt normal execution. It is used for e. g. hardware devices to signal the processor something has happened that requires timely processing, so that real-time behavior can be achieved (for variable definitions of real-time). Interrupts can also be triggered by software, and this explanation is a gross oversimplification, but that information is what is most likely relevant and interesting for your case at this point.

The commands you posted will sort the interrupts and output the one with the highest count (via head -1), thereby determining the interrupt that gets triggered the most. It will then disable that interrupt via the user-space interface to the ACPI interrupts.

One of the goals of ACPI is to provide a kind of general hardware abstraction without knowing the particular details about each and every hardware device. This is facilitated by offering (among other things), general purpose events - GPEs. One of these GPEs is being triggered a lot, and the processing of that interrupt is what causes your CPU spikes.

The changes you made will not persist after a reboot.

Since this is handled by kworker, you could try and investigate further via the workqueue tools: https://github.com/torvalds/linux/tree/master/tools/workqueue

In general, Linux will detect if excessive GPEs are generated (look for the term "GPE storm" in your kernel log) and stop handling the interrupts by switching to polling. If that happens, or if the interrupts are manually disabled, the system might not react to certain events in a timely manner. What that means for each particular case depends on what the interrupts are being responsible for - hard to tell without additional details.

[-] far_university190@feddit.org 2 points 4 days ago

the system might not react to certain events in a timely manner.

But still react? Resource for read more?

[-] scrion@lemmy.world 4 points 3 days ago

I'll post some links, but it's a pretty busy week for me already, so give me some time.

[-] cmnybo@discuss.tchncs.de 42 points 4 days ago

That's a kernel worker for ACPI. It sounds like you may have a driver for something that is misbehaving.

[-] Atemu@lemmy.ml 1 points 3 days ago

More likely is the device firmware and you likely can't fix that.

[-] astrsk@fedia.io 32 points 4 days ago

It’s the Linux version of steam taking advantage of idle time to process shaders. It’s a critical part of making all those proton launched games working right. I wish it had better control for when to run it but it is what it is.

[-] cheviotveneer@sh.itjust.works 8 points 3 days ago

If this were true, OP would see Steam as a user-mode process taking up the CPU time. Since the OP image is sorted by CPU time and the process isn't visible, it's gotta be those kernel threads that aren't displayed by default.

[-] Commiunism@lemmy.wtf 9 points 4 days ago

Just as a PSA, the feature is currently somewhat bugged and really should be avoided. For anything that's not a low-end PC, your machine can handle the compilation during runtime easily and do it much faster.

For low-ends, it compiles so many unnecessary shaders (such as all workshop content that you might not even have), it often takes 10x longer to compile everything (which you have to recompile on every driver or game cache update) than just playing the game and watching a replay first or something.

[-] Quail4789@lemmy.ml 5 points 4 days ago

This isn't the case here and you can turn the background processing off or change how many cores it'll use.

[-] JustEnoughDucks@feddit.nl 2 points 3 days ago* (last edited 3 days ago)

"Critical" as in not really needed.

It is very bugged and constantly runs even if it isn't doing anything. It will also max out your disk IO for hours at a time with an HDD for larger game storage.

I have had it off for 1.5 years across 3 OS installs and have never had a problem with stuttering or shader related problems in that time. It is really not needed anymore for 95% of games since the Linux async solutions were merged.

Maybe if one uses severely out of date kernels it is critical

[-] technocat@lemm.ee 16 points 4 days ago* (last edited 4 days ago)

I have been fighting with this for a long time, do you have an external monitor? I find this happens if I wake from light-sleep (not hibernation) while an external monitor is plugged in.

One of my ACPI interrupts just goes off the charts.

[-] danielquinn@lemmy.ca 10 points 4 days ago

In one of the other comments, we worked out that it was definitely something to do with ACPI, but yes I do have an external monitor. This is a desktop system.

Disabling the interrupt did the job, but I don't know why it's happening. If this is related to the monitor, could this be an Nvidia thing?

[-] technocat@lemm.ee 6 points 3 days ago* (last edited 3 days ago)

I have a pretty old integrated Intel GPU. Happens to my Thinkpad pretty regularly.

[-] muhyb@programming.dev 1 points 3 days ago

Similar thing happens to me with my two monitor setup. No problem when I use single monitor. No problem when I use two monitor. However when I plugged out the second monitor or switch to single monitor with my script, the CPU starts doing random spikes on single cores in short intervals. Only a reboot fixes this.

[-] thingsiplay@beehaw.org 14 points 4 days ago

It's probably Shader compilation. Funny enough the top result of my websearch is my own post/thread in Reddit 4 years ago. I had this exact same question on my old computer: https://www.reddit.com/r/linux_gaming/comments/kyf1wf/why_is_steam_using_one_core_always_but_doing/ Shader compilation is done from time to time in the background while Steam runs. This prepares games to run better.

Look if there is a process called fosselize. That was the process name back then doing the Shader compilation.

[-] davel@lemmy.ml 2 points 4 days ago

I’d never even heard of shader compilation. Apparently for the Steam Deck, Valve provides pre-compiled shaders for some games. What Is Shader Compilation and Why Does It Make PC Games Stutter?

[-] thingsiplay@beehaw.org 5 points 4 days ago

Yes. That's the benefit of having a single hardware to target. Same goes for consoles. They obviously know the hardware (like in Steam Deck's case) and can precompile and ship it. There was plans (or just talks? not sure if this was ever realized) that users can download precompiled Shaders from other users, if its the exact same hardware.

[-] Quail4789@lemmy.ml 4 points 4 days ago

Steam can download precompiled binaries that's suitable for any system it they exist. If you turn it on, they'll also collect shaders from you for others to download (not P2P).

It's often said in r/linux_gaming that you no longer need shader precompilation, though without giving any reason. In my experience turning it off doesn't have any performance penalty. But games with baked -in shader compilation will take 10 minutes to do it themselves on every launch which is annoying af.

[-] Atemu@lemmy.ml 1 points 3 days ago

If you have a reasonably up to date mesa and use a Proton version with a new enough DXVK, DXVK can utilise Graphics Pipeline Libraries to link shaders just like a d3d11 driver on Windows would, eliminating stutter.

I believe shader precomp is used for some video codec edge cases though, so YMMV depending on the game.

[-] Commiunism@lemmy.wtf 2 points 4 days ago

The problem with fosselize is that it's currently bugged, and happens to precompile way more things than are needed, such as all workshop content that you might not have installed which takes a really long time + bloats up the shader cache in size. On anything that's not low-end, it's pretty much a waste of time since shader compilation is easily done on runtime.

Some issues on the things I've mentioned that Valve hasn't seem to have responded yet: bloat, time

[-] thingsiplay@beehaw.org 1 points 4 days ago

Thanks for confirmation. After all, Shaders ARE actually shared; which is a good thing. Maybe for certain games its no longer needed to have Shader pre-compilation enabled. Because games does it themselves or maybe because the download of the compiled shaders from Valve (or collected ones) they come to conclusion the pre-compilation option is no longer needed? It's hard to say if people do not explain their recommendation. It's also not a straight forward and easy thing to test, so people can easily end up with wrong conclusions.

As for the annoying factor, every update requires pre-compilation (if enabled and only those games that need it off course). And if you have lot of games installed, it can be really annoying too.

[-] j4k3@lemmy.world 5 points 4 days ago

I've had this happen with AI stuff that runs in a Python venv. It only happens with apps that use multi threading, and usually when something is interrupted in an unintended or unaccounted for way. I usually see it when I start screwing with code stuff, but also from changing the softmax settings during generation or crashing other stuff while hacking around. There may be a bug of some kind, but I think it likely has more to do with killing the root threading process and leaving an abandoned child that doesn't get handled by the kernel process scheduler in the standard way. If this happens I restart too.

[-] electricprism@lemmy.ml 1 points 3 days ago

Is this why people use Zen?

[-] Atemu@lemmy.ml 4 points 3 days ago

No, it wouldn't make any sort of difference.

[-] ReversalHatchery@beehaw.org 1 points 4 days ago

since your CPU has 16 threads ("cores" but not really cores, you probably only have 8 of that), if a process uses up all the capacity of a single core, that will have a 100/16 = ~6% cpu usage. In my experience looking for this really works.. at least on windows, please don't hurt me. it should on linux too, but there I don't have it at such a visible place.

this may not work that much though when your system is under a higher load, and the process you're looking for also has a higher CPU usage, like 30% or something.
in this case you'll want to look for the cpu usage of the individual threads of processes with a higher cpu usage. if you have a process which has a thread with 6% cpu usage (in case of a 16 hardware thread cpu), then that process is at fault. by looking at the name of the thread you may even find out what is its purpose.

[-] MonkderVierte@lemmy.ml 5 points 3 days ago

at least on windows, please don't hurt me

Nah fam, you're hurt enough.

this post was submitted on 27 Oct 2024
153 points (100.0% liked)

Linux

47943 readers
1456 users here now

From Wikipedia, the free encyclopedia

Linux is a family of open source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991 by Linus Torvalds. Linux is typically packaged in a Linux distribution (or distro for short).

Distributions include the Linux kernel and supporting system software and libraries, many of which are provided by the GNU Project. Many Linux distributions use the word "Linux" in their name, but the Free Software Foundation uses the name GNU/Linux to emphasize the importance of GNU software, causing some controversy.

Rules

Related Communities

Community icon by Alpár-Etele Méder, licensed under CC BY 3.0

founded 5 years ago
MODERATORS