I've spent some time searching this question, but I have yet to find a satisfying answer. The majority of answers that I have seen state something along the lines of the following:
"It's just good security practice."
"You need it if you are running a server."
"You need it if you don't trust the other devices on the network."
"You need it if you are not behind a NAT."
"You need it if you don't trust the software running on your computer."
The only answer that makes any sense to me is #5. #1 leaves a lot to be desired, as it advocates for doing something without thinking about why you're doing it -- it is essentially a non-answer. #2 is strange -- why does it matter? If one is hosting a webserver on port 80, for example, they are going to poke a hole in their router's NAT at port 80 to open that server's port to the public. What difference does it make to then have another firewall that needs to be port forwarded? #3 is a strange one -- what sort of malicious behaviour could even be done to a device with no firewall? If you have no applications listening on any port, then there's nothing to access. #4 feels like an extension of #3 -- only, in this case, it is most likely a larger group that the device is exposed to. #5 is the only one that makes some sense; if you install a program that you do not trust (you don't know how it works), you don't want it to be able to readily communicate with the outside world unless you explicitly grant it permission to do so. Such an unknown program could be the door to get into your device, or a spy on your device's actions.
If anything, a firewall only seems to provide extra precautions against mistakes made by the user, rather than actively preventing bad actors from getting in. People seem to treat it as if it's acting like the front door to a house, but this analogy doesn't make much sense to me -- without a house (a service listening on a port), what good is a door?
This question reads a bit to me like someone asking, "Why do trapeze artists perform above nets? If they were good at what they did they shouldn't fall off and need to be caught."
Do you really need a firewall? Well, are you intimately familiar with every smidgeon of software on your machine, not just userland ones but also system ones, and you understand perfectly under which and only which circumstances any of them open any ports, and have declared that only the specific ports you want open actually are at every moment in time? Yes? You're that much of a sysadmin god? Then no, I guess you don't need a firewall.
If instead you happen to be mortal like the rest of us who don't read and internalize the behaviors of every piddly program that runs or will ever possibly run on our systems, you can always do what we do for every other problem that is too intensive to do manually: script that shit. Tell the computer explicitly which ports it can and cannot open.
Luckily, you don't even have to start from scratch with a solution like that. There are prefab programs that are ready to do this for you. They're called firewalls.
Tell the computer explicitly which ports it can and cannot open.
Isn't this all rather moot if there is even one open port, though? Say, for example, that you want to mitigate outgoing connections from potential malware that gets installed onto your device. You set a policy to drop all outgoing packets in your firewall; however, you want to still use your device for browsing the web, so you then allow outgoing connections to DNS (UDP, and TCP port 53), HTTP (TCP port 80), and HTTPS (TCP port 443). What if the malware on your device simply pipes its connections through one of those open ports? Is there anything stopping it from siphoning data from your PC to a remote server over HTTP?
The point of the firewall is not to make your computer an impenetrable fortress. It's to block any implicit port openings you didn't explicitly ask for.
Say you install a piece of software that, without your knowledge, decides to spin up an SSH server and start listening on port 22. Now you have that port open as a vector for malware to get in, and you are implicitly relying on that software to fend it off. If you instead have a firewall, and port 22 is not one of your allowed ports, the rogue software will hopefully take the hint and not spin up that server.
Generally you only want to open ports for specific processes that you want to transmit or listen on them. Once a port is bound to a process, it's taken. Malware can't just latch on without hijacking the program that already has it bound. And if that's your fear, then you probably have a lot of way scarier theoretical attack vectors to sweat over in addition to this.
Yes, if you just leave a port wide open with nothing bound to it, either via actually having the port reserved or by linking the process to the port with a firewall rule, and you happened to get a piece of actual malware that scanned every port looking for an opening to sneak through, sure, it could. To my understanding, that's not typically what you're trying to stop with a firewall.
In some regards a firewall is like a padlock. It keeps out honest criminals. A determined criminal who really wants in will probably circumvent it. But many opportunistic criminals just looking for stuff not nailed down will probably leave it alone. Is the fact that people who know how to pick locks exist an excuse to stop locking things because "it's all pointless anyway"?
Once a port is bound to a process, it’s taken. Malware can’t just latch on without hijacking the program that already has it bound.
Is this because the kernel assigns that port to that specific process, so that all traffic at that port is associated with only that process? For example, if you have an SSH server listening on 22, and another malicious porgram decides to start listening on 22, all traffic sent to 22 will only be sent to the SSH server, and not the malicious program?
EDIT (2024-01-31T01:20Z): While writing this, I came across this stackoverflow answer, which states that when a socket is created it calls some bind() function that attaches it to a port. This makes me wonder how difficult it would be for malware to steal the bound port.
Is this because the kernel assigns that port to that specific process, so that all traffic at that port is associated with only that process?
Yes, that's what ports do. They split your IP connection into 65,536 separate communication lines, that's the main thing, but that is specifically 65,536 1-on-1 lines, not party lines. When a process on your PC reserves port 80, that's it. It's taken. Short of hacking the kernel itself, it cannot be reassigned or stolen until the bound process frees it.
The SO answer you found it interesting, I was not aware that the Linux kernel had a feature that allowed two or more processes to willingly share a single port. But the answer explains that this is an opt-in parameter that the first binding process has to explicitly allow. And even then, traffic is not duplicated to all listening processes. It sounds like it's more of a "first come first serve" to whichever of the processes are free to read the incoming message at the time it arrives, making it more of a load balancing feature that isn't a useful vector for eavesdropping.