@CupiaBart: So here's a story of, by far, the weirdest bug I've encountered in my CS career. Along with @maciejwolczyk we've been training a neural network that learns how to play NetHack, an old roguelike game, tha...…
Along with @maciejwolczyk we've been training a neural network that learns how to play NetHack, an old roguelike game, that looks like in the screenshot. Recently, something unexpected happened.
I had an issue where a client reported a crash on login. The exception and stack trace reported were very generic and lent no clues to the cause. I tried debugging but could not reproduce. I eventually figured out that the crash only happened for release (non-debug) builds that were obfuscated. I couldn't find the troublesome code, so I figured out which release introduced the issue, then which commit, then went change by change until I was able to find the cause. It turned out to be a log message in a location that was completely unrelated to login. That exact log message was fine a few lines up. Other code worked fine in that location. For some unknown reason, having that log message in that specific location caused a crash in a completely different area of code.
Usually a sign of multiprocessing/multithreading going wrong, e.g. accessing the same resource without proper locks like opening the same logfile in different processes and trying to write simultaneously.
Those errors can be triggered just by reformating the code (or obfuscating in this case), thus changing the runtime behaviour slightly.
Hard to find, especially since they're dependent on the speed/workload of the machine running the code.