So, I am looking for any ideas, help or head(s) smarter than mine.
This is my main problem for some time now, easiest way to replicate this is to run anything proton related. After that, some apps just keep spitting this via dmesg:
Well, I don't think that there's going to be a way to narrow it down just from that. Can maybe suggest some things to try.
Segfaults happen when an app is trying to write to memory that it shouldn't be. It tends to indicate either a bug (which shouldn't be the case at the application level, given multiple applications doing it), or corrupt memory.
INTEL i7-4790
Well, it's not the recent 13th+14th gen Intel hardware problems; I had two processors produce corrupt memory via that route recently. That's an ten-year-old processor, though.
Memory would be an obvious thing to blame, but memtest not hitting it is an argument against that.
Proton-related stuff -- Windows games running under Steam -- might use a fair bit of memory. Do you have swap space, maybe?
I can imagine a swap device corrupting memory. If you're using any swap, I'd probably do a fresh boot, use sudo swapoff -a to disable swap space, and then try your repro case with proton.
It's possible that there's some kind of kernel bug relating to memory allocation that you're tripping, I guess, but you're running an up-to-date Arch, so I assume that you've got a recent kernel, and you're saying that this has been going on for "some time". That would be consistent with problems continuing to happen once the problem occurs on a given boot and a reset making things go away, I suppose.
I guess it could hypothetically be a problem with the Nvidia drivers...if the Proton games you're playing are 3d games, that might be tripping it.
A game might also be stressing the CPU, triggering temperature or other issues. The stress utility would let you generate sustained CPU load on cores (--cpu), maybe see if that reproduces your problem.
Proton games might be loading the GPU...I don't know of a good way to artificially generate load there, unfortunately.
You could try checking the kernel log for any errors preceding the segfaults in the kernel log, maybe when you're running that Proton game; any issues there might give a clue.
Which memtests have you tried? They all function a little differently, and passing one doesn't mean it will pass another. My rig passed OCCT and TM5 with flying colors, but it would fail every time on prime95 (until I eventually got it stable).
I just went for the memtest86 and let it do its thing.. Should I do some more testing? I really feel like this is not about memory, at least not in a “physically damaged” way.
It might not be a memory thing. If you run out of options and are down to trying memory again, take a look at the MemTestHelper test recommendations. You shouldn't have to run tests for more than ≈0.5–1.5 hours at a time (the 8hr+ testing regimen is pointless).