So I am not a professional programmer, but I do like to tinker with projects and just teach myself stuff (in python and now rust).
I currently just install stuff on my linux distro off of the repos or anaconda for python. I've never had any particular issue or anything. I was thinking about maybe moving projects into a container just so that they are more cleanly separated from my base install.
I am mostly wondering about how the community uses containers and when they are most appropriate and when they are more issue than they are worth. I think it will be good for learning, but want to hear from people who do it for a living.
Containers are used for a whole bunch of reasons. I'll address just
one: process isolation. I'll only do one because I've ran into times
when containers were not helpful. And it may lead to some funny stories
and interesting discussion from others!
A rule of thumb for me is that if the process is well-behaved, has its
dependencies under control and doesn't keep uneccesary state, then it
may not need the isolation provided by a container and all the tooling
that comes with it.
On one extreme, should we run ls in a container? Probably not.
It doesn't write to the filesystem and has only a handful of
dependencies available on pretty much any Unix-like/Linux system.
But on the other extreme, what about that big bad internal Node.JS
application which requires some weird outdated Python dependencies
that has many hardcoded paths and package versions?
The original developer is long gone.
It dumps all sorts of shit to the filesystem.
Nobody is really sure whether those files are used as a cache or they contain some critical state management.
Who wants to spend the time and money to tidy that thing up?
In this scenario containers can be used to hermetically seal a fragile thing.
This can come back to bite you.
Instead of actually improving the software to be portable and robust
enough to work in varied execution environments (different operating
systems, on a laptop, as a library...), you kick the can down the road.
And FYI to OP, if you can't install two versions of the same library at the same time (ex: numpy 1.25 and numpy 1.19) then the answer to "has its dependencies under control?" is generally "no".
Deno (successor to NodeJS) is "yes" by default, and has very very few exceptions
Rust can by default, and has few but notable/relevant exceptions
Python (without venv) cannot (even with venv, each project can be different, but 1 project still can't reliably have two versions of numpy)
NodeJS can, but it was kind of an afterthought, and it has tons modules that are notable exceptions
No worries! Writing that down actually helped clarify some of my thoughts.
Something extra: distributed computing.
Let's say you have 3 processes that need to communicate with one
another. There's heaps of tooling available in OSs to manage those
processes. Logging, networking, filesystem access, privilege
separation, resource allocation... all provided by the host OS without
installing anything. But what if those 3 processes can't run on one
"machine"? Which process should go where? What if it needs 8GB memory
but there's only 6GB available on some of the machines? Who controls
that?
Systems like Kubernetes, Nomad, Docker Swarm etc. offer a way to
manage this. They let us say something like:
run this process (by specifying a container image),
give it at least these resources (xGB memory, xvCPUs)
let it communicate with these other processes (e.g. pods, overlay networks...)
These systems manage containers. If you want to do distributed
computing and want to take advantage of those systems to manage it,
stuff needs to be run in containers.
Containers are not the only way to do distributed computing - far from
it! But over the past few years this particular approach has become
popular in the, umm... "commercial software development industry".
Opinion. Are Linux containers something to look into as someone who doesn't work in the industry?
Unless you're interested in how containers themselves work and/or distributed computing;
frankly - no. Computers are still getting faster and cheaper.
So why is all this stuff so popular in the commercial world?
I'll end with some tongue-in-cheek.
Partly it's because the software development industry is made up of
actual human beings who have their own emotions and desires.
Distributed computing is a fun idea because tech people are faced with
challenges tech people are interested in.
Boring: can we increase our real estate agency brand recognition by 200%?
We could provide property listings as both a CSV and PDF to our partners!
Our logo could go on the PDF! Wow! Who knows how popular our brand could be?
Fun: can we increase throughput in this part of the system by 200%? We
might need to break that component out to run on a separate machine!
Wow! Who knows who fast it could go?
I usually use nix to manage my development environments.
At the root of the git repo for my blog, there is a shell.nix file. This file, shell.nix, declares an entire shell environment, giving me tools, environment variables, and other things I need. I just run nix-shell while in the same directory as the shell.nix file, and it creates that shell environment.
I think lxc/incus (same thing) containers are kinda excessive for this case, because those containers are a full linux system, complete with an init system and whatnot. Such a thing is going to use more resources (ram, cpu, and storage space), and it's also going to be more to manage compared to application containers (docker, podman), which are typically very stripped down and come with only what is needed to run the application.
I used to use anaconda, but switched away because it doesn't have all the packages I wanted, and couldn't control the versions of packages installed very well, whereas nix does these both very well. Anaconda is very similar in usage though, especially once you start setting up multiple virtual anaconda environments for separate projects. However, I don't know if anaconda is as portable as nix is, able to create an entire environment from a single file of code.
Simplest way in my experience is distrobox. It will automatically spin up a container (podman or docker depending on your distro) for you with a fresh OS. Go ahead and install whatever without touching your main install, spin one up for each major project and one for scratching around. Instructions only available for Arch or Ubuntu, no problem. One gotcha, make sure they have their own homes, or you'll get junk in yours...
They allow you to just install and test pretty much anything you want and if it doesn't go well... just rebuild the container and start again. Rebuilding a container takes about 5 seconds to fix problems that would take 5 weeks of headaches if you made the same mistake on your main operating system.
If apt-get install wants to install a bunch of dependencies you're not sure about, oh well give it a try and see how it goes. That's definitely not an approach you can use successfully outside of a container.
Another benefit of containers is you can have two computers (e.g. a desktop and a laptop) and easily share the exact same container between both of them.
Personally I use Docker, because there are rich tools available for it and also it's what everyone else I work with uses. I can't speak to wether or not Incus is better as I've never used it.