Tdarr is probably your best bet. Its main focus is video but it used ffmeg as the backend, so anything it supports is supported in Tdarr (theoretically)
This, I use tdarr with the "migz convert audio" and "downmix & dynamic range compression" plugins to make sure all my videos have stereo audio channels which gives me a far more consistent experience across my devices