So there are multiple sites&groups that pirate video games especially on PC. I was wondering if there are places on the internet where you find source code for games especially the highly modifiable ones like Half Life 2/Portal and Skyrim. Or groups that crack into the source code of games (or even software in general), not only for PC maybe PS, XBox or mobile too, and share it. I just wanted to see some code samples of games or their engines, maybe I get hooked into video game design. Shout out to Valve for sharing a lot about the creation of Half Life 1
From what I understand: During the compilation of the code it becomes unreadable to humans, needing to be reverse engineered, which is entails insane amounts of work. So, unless there is a leak or the game isn't fully compiled like (I think) Unity games, it will be unlikely to find source code.
The basic flow of compilation is (human readable-ish) source code becomes symbols which the compiler then applies optimizations to before converting into machine code.
With no optimizations, it is a pretty trivial process to revert back. You lose the names of variables and functions (unless they were added as annotations by the compiler, which they often are for many reasons) but you get the logic.
With optimizations, things get messier. An example that is pretty popular and useful to understand this is https://en.wikipedia.org/wiki/Fast_inverse_square_root The coder likely wrote invsqrt(foo), expecting this to map to a series of hardware functions in the floating point unit. Instead, they see that it became a series of integer operations and bit shifts and get very very confused. Its like that, except imagine an entire 5000 line file becomes two instructions.
That said, compilers are stupid and most people (rightfully) use libraries to a large portion. So the above example would actually not happen because they would just see the instructions to do a function call labeled "fast_invsqrt" and not care. Similarly, basic heuristics (or, if you are trying to get funding, machine learning and neural nets) can be applied to figure out what was done in heavily optimized blocks of code. So the resulting "source code" might be something like
int foo(int bar)
{
int a = 4;
// Loop to do some fancy ass math shenanigans to a
return a + 5;
}
Which results in more or less human readable code again. Also, a lot of compilers will actually embed basically comments in the assembly/binary unless someone remembers to actively disable that. And nobody ever does.
Because source code really doesn't matter all that much. If anyone ever did a deep dive of the Source Engine and Unreal Engine (of the day. I think that was 3?)... they were actually a lot more similar than not. Yeah, there was a lot of infrastructure and even some fundamental differences (been a minute, but I want to say Source is still additive because of its quake origins whereas UE3 was still subtractive geometry?). But... function influences form and they have most of the same functionality.
The issue is more, like with DRM, those "week one sales" as it were. Data is sparse for a lot of reasons, but Securom for Mass Effect 1 PC is generally treated as a massive win because warez groups could not properly crack it for a week or so which led to a lot of pc players just buying the game because we needed hot blue lady action NOW. Same with the reason so many games still use denuvo.
And same with engine and game features. Unity and Unreal basically were in an arms race for years (and Cryengine basically sucked on the barrel of their own gun) where one would add one feature and the other would add the same one a few months later. Having access to the source code potentially means someone can accelerate that a lot...
Except they won't. Because this shit is cancer. Any dev who does anything even remotely similar to how Wolverine jacks off to his friend's wife in a game is opening themselves up to a LOT of lawsuits and investigations. Its why almost every emulator group will go scorched earth on anyone who even acknowledges looking at the nintendo leaks. And it is why someone quitting their job at Microsoft to go work at Google and offer to provide a Teams feature gets blacklisted and reported immediately. Companies don't fuck around with that.
Which leads to the real reason these are so detrimental to gaming and software. People... put questionable comments in their code all the time. Sometimes it is ///TODO: Implement this in Q2 2024 for release 1.5. Sometimes it is //Hey, Fuck you Fred. I finally did this. Stop your fucking bitching and stop whining to John that you couldn't do anything until I did this for you. And sometimes you get bullshit like Bungie (?) using racial slurs as codenames for a lot of the skins and the like...
And it also makes it a lot easier to make a cracked binary if you know what the code was before the DRM was applied.
So decompiling code is actually a lot more viable than you would expect. But also entirely pointless.
Correct. However, this looks like a direction that language processing and GPTs would excel in and to my knowledge, there already have been some ML addons made for certain RE tools, so this might become more easy in the future. If that is necessarily a good thing is a different story...