Laptop randomly freezes. How to debug it?
Hi all, sorry if I post this the wrong place.
I have a laptop running mint with qtile which sometimes freezes. To the point where nothing responds and I need to kill it. I've tried: sudo journalctl But I don't get any information which helps me.
Can anyone help to debug it?
I would look at these things first.
Great idea. I will try some time with XFCE. One of the challenges is that I don't know how to trigger the crash, and sometimes it can be days, other times multiple times within an hour.
Boot memtest
Leave it to do it's thing overnight. That will at least check for badly failing RAM.
I've run this on machines that I thought were ok, only to find... they weren't.
Can you check
/var/log/syslog?Something like:
tail -f /var/log/syslogmay help out.That file usually has the answers. Its just hard to find the exact lines.
There are an absurd amount of this:
That seems to be there all the time, and also before crashing/freezing
If your machine is a Tuxedo laptop, this thread might interest you. Seems as though this user was hitting thermal limits and their laptop would freeze/poweroff to keep from dying.
+1 to this line on inquiry.
When I've managed to get a modern Linux desktop to freeze, I've had a bad power supply, or heat issues, each time ( specifically when I have had a full freeze, where even the alternate terminals didn't respond ).
Interesting that the power supply might be the problem. Never thought about that. Sounds like a nightmare to come to that conclusion.
Maybe something to do with your audio?
https://www.reddit.com/r/ManjaroLinux/comments/127diso/rtkitdaemon_furiously_logging_warning_messages_in/
Maybe https://bugs.launchpad.net/debian/+source/rtkit/+bug/1547589
not sure.
I am assuming the problem is your ram runs out of memory. And Linux kernel's OOM killer is too slow and doesn't kill processes earlier. You have two options to try:
or you can use both options to get the best of you laptop
This is my first time hearing about zram. I tried looking into it myself but there's still some things I could find answers to. Are there any downsides to using zram, or is it something that I can set and forget about? If I were to be using compression tools, like 7zip, would I be able to use a higher dictionary size than I normally can, or would zram cause problems outside of potentially slower compression speeds?
Edit: I think I'm just going to test this myself. I've been reading more about zram and alternatives and it seems like there's conflicting information.
It might not be a bad idea having btop with like 100ms refresh. Watch it while using it and see what the thermals are as everyone is suggesting. Or if any other cpu thing is being weird. I've had weird freezes before due to lack of power. Or being on a screwed up power plan.
What laptop is it?
I'll try to monitor the temperature more closely, but nothing suggest that anything is wrong (doesn't feel hot, and the freeze can happen without doing anything intensive).
It's a tuxedo laptop.
Can you ctl+alt+4 to another terminal? Could you ssh from another machine?
No, I can't access another tty during a freeze, unfortunately.
That's may be a hint!
The only times I've had the desktop freeze on Linux and the alternate terminals fail to respond, I had a hardware issue.
In one case, I was on a Raspberry Pi, and my power supply was not delivering clean enough power for the board.
In another case, my fan wasn't connected properly and the motherboard was overheating.
Edit: Oh! I think I had this behavior once with a RAM stick that was terribly subtly not quite all the way in the slot.
Since you specifically mention qtile you should undo your customizations and see if that fixes your problem?
If not, you should look at the journal after reboot:
journalctl -bBut you'll need to filter it.
Try
journalctl -b | grep -v rtkit-daemon, which will remove the masses of entries you bemoaned in another comment (AFAICS all syslog entries should also be in the journal anyhow).Very important:
Please make note of when the problem happened, and if your journal entries even go that far in time.
Also, add
-rto show the log in reverse. If you want to look at previous boots other than the last you can specify like sojournalctl -b -2 -rI just got the following freeze that it recovered from by itself (rarely happens). After filtering the log a bit I get the following from around the time it happened.
Nice, that looks pretty obvious.
In addition to the other reply, you should search around your distro* having problems with (certain) AMD gpus; maybe all you need is a backported kernel.
* I don't think you ever mentioned. If it's Ubuntu-based, search for Ubuntu.
I use linux mint debian edition (LMDE)
Could it be a CPU/iGPU too new for the kernel LMDE is running? I haven’t used LMDE in a while, so I imagine that if it’s still based on bookworm the in-kernel drivers could be too old. If it’s trixie-based that probably isn’t the problem though.
Looks like the chipset/graphics driver is crashing. That can be because of the driver or the hardware.
It will be hard to diagnose, but you can search for the most detailed of those log lines together with your laptop model and see if that yields anything. There problem is that it's never possible to know whether you have a software issue or a hardware issue that is exposed by particular software.
You can try installing a completely different os (i.e windows) to see if the same problem occurs - if it does you can be fairly sure it's hardware.
What are the machine’s specs? I had a similar issue on a Ryzen 5 3500U laptop before, but more recent kernels (6.8+) don’t exhibit that behavior.
maybe the ram gets full?
i'd have htop and dmesg -w running visibly and see if there's any issue showing on the screen as it freezes
@[email protected] Is this on an old thinkpad or older intel processor? I once had an issue like this that turned out to be the processor not entering one of the cstates properly, the fix was disabling the cstate reponsible, I cant remember how to do this now, it involved editing a text file, but after this there were no problems
It was really difficult to debug because it would hard crash with nothing suspicious in the logs. I think it was a random hardware bug as it would occasionally happen in windows also. Try disabling idle states completely to see if that is the cause.
It might also be overheating. If you can monitor the temperature it might give you a clue.
I checked that at some point, and I don't think that was the case, although sometimes during a crash the fans starts going crazy.
What about ctrl-alt-F3?
This is to enter a different TTY, right? That doesn't work either.
Correct. Sometimes it works when everything else fails.