I've been able to generate very good results with this open source project. You need a pretty good nVidia GPU, and it takes some time and tedious work to get it working they way you want it to:
Some voices sound exactly right. Other sound like a broken robot. The main reason I like it is that I can run it local without having to sign up for some stupid cloud service.
I have only used it with American English. Oddly, it will sometimes slip into a British accent. I believe it is possible to retrain it on other languages, but I have not done the deep dive required to do so.