I have too many machines floating around, some virtual, some physical, and they're getting added and removed semi-frequently as I play around with different tools/try out ideas. One recurring pain point is I have no easy way to manage SSH keys around them, and it's a pain to deal with adding/removing/cycling keys. I know I can use AuthorizedKeysCommand on sshd_config to make the system fetch a remote key for validation, I know I could theoretically publish my pub key to github or alike, but I'm wondering if there's something more flexible/powerful where I can manage multiple users (essentially roles) such that each machine can be assigned a role and automatically allow access accordingly?
I've seen Keyper before, but the container haven't been updated for years, and the support discord owner actively kicks everyone from the server, even after asking questions.
Is there any other solution out there that would streamline this process a bit?
I would switch to certificate based SSH authentication.
All the server keys gets signed by your CA, all clients also gets signed by your CA. Everyone implicitly trust eachother though the CA and it's as safe as regular SSH keys.
You can also sign short lived client keys if you want to make revocations easier, the servers don't care because now all it cares is that it's a valid cert issues by the CA, which can be done entirely offline!
HashiCorp Vault can also help managing the above, but it's also pretty easy to do manually.
It's such an underrated feature. It baffles me how people immediately turn to overly complicated solutions solving a problem they don't really have to solve, just because everyone assumes the only way is the default commonly known way. Like OP, people immediately jump to the conclusion you need extra software to manage the keys, rather than using another authentication method natively supported, and keep filling their known_hosts file with junk, making the whole validation process useless because everyone just accepts whatever key the host presents.
It's amazing how simple it is. Developer needs temporary access to debug a web server? Sure, here's your 2h valid cert to log in as the web user on the server, don't even need to actually log into the server to put their key in and then remove it. I mint a cert and it's ready to go on whichever users and servers I specified in the cert. Can't even gain persistence because regular authorized_keys is disabled and we have limited session durations.
I regularly leave people baffled at work because I come up with a clever and built-in super simple solution to something they expected to just slap more scripts and software to work around the only way they know to use the software. Read your manpages in full folks, it'll save you so much work. Know what your software is capable of.
I've been using https://github.com/warp-tech/warpgate for essentially this purpose. It does kind of put all of your eggs in one basket so don't expose this to the Internet and probably keep at least one other machine that has all the keys. I haven't had any catastrophic issues so far other than my host going down (unrelated to this tool).
Terrible idea of the day: You could use something like NFS and map the drive on all clients. On that drive you can have the latest keys then use symlinking to update, etc.
Something like puppet, chef, ansible are likely better choices.
I quite like Tailscale SSH for this, but I don't have as many machines, so not sure how it will scale. You can definitely assign roles here to allow/deny SSH between hosts in your fleet though.
You could try SSH certificates using something like https://smallstep.com/sso-ssh/ - essentially you delegate validation of your public key to a IDP, which your servers are configured to trust.
The other approach would be something like ansible or puppet to deploy trusted keys to all servers
Hm... these are both interesting but might be a bit overkill IMO.
I don't think I'd need a CA and intermediary step if all SSHd needs to do is check if a key is a currently approved key for this particular service or not; and I last looked at chef/puppet many years ago, and it was way too much orchestration work that we no longer need w/ Docker containers and smaller footprint host OSes.
Have an alias so trusted hosts can bounce through my authorization host and end up on a tmux session on the targetted host. It has logging and such but mostly it's for simplicity.
If I plan to use that connection a lot there's a script to cat my priv key through the relay.
Have an scp alias too, but that gets more complicated.
For more sensitive systems I have 2fa from gauth set up, works great.
Your idea sounds good to go ahead and publish your pubkey(s) to fully public URL you control and can memorize.
Then you can stash or memorize the curl command needed to grab it (them) and authorize something to it (them).
A lot of more complicated solutions are just fancy ways to safely move private keys around.
For my private keys, I prefer to generate a new one for each use case, and throw them out when I'm done with them. That way I don't need a solution to move, share or store them.
Edit: Full disclosure - I do also use Ansible to deploy my public keys.
You could use ldap with OpenLdap, Keycloak, freeipa, etc to set ssh keys for users.
If you want something simpler, you could use Ansible (or another cm) or just have a startup script that downloads the authorized keys file from GitHub or wherever you can store it.
And if you want something less simple, hashicorp Vault supports dynamic ssh keys using certificates.
Yeah, the problem is that I have 2 physical servers, each with 5 to 10 VMs on it, and a bunch of other VMs scattered across different cloud providers; it gets tricky to edit the ~/.ssh/authorized_keys file on each of them to reflect a new SSH key (i.e.: new machine on the "network") or replace an existing SSH key (i.e.: annual key cycle).