espeak default voice backend is synthesized without using actually real voice samples. So it doesn't require downloading a huge package for each language, which is convenient in some cases, but the outcome is extremely robotic.
You can use MBROLA as backend for espeak so that it uses some voice samples and the result should be less jarring (it'd still be easy to tell it's not natural voice, but at least you'd be able to understand it better).
There's a tutorial on this here: https://github.com/espeak-ng/espeak-ng/blob/master/docs/mbrola.md
In addition to the very insightful reply by Ferk: Sadly most TTS development seems to be happening as online service these days. Google Neural TTS and Microsoft Azure TTS sound really great but require an online connection, an account, and possibly even paying (there's a threshold until it's free, then it costs almost nothing but almost nothing isn't free).
Btw, I don't know about the blind people you know but the ones I know use so insanle fast TTS output, the "sounds nice" aspect isn't really there in the first place. At least not to me.
The development of Piper is being driven by the Home Assistant Project. That probably makes it one of the larger OSS TTS projects. Hope may not be lost yet ;)
A few days ago I wrote down a couple of links to interesting TTS projects that I was going to look into whenever I have time, along with some brief notes.
I have a tiny laptop with the literal bare minimum to get this running haha. Your probably right but the models explode your memory pretty quick.
I did get some really good audio out of this model after a while. I threw the first chapter of the hobbit at it and it seemed to be doing ok. It's better than espeak and you only need to do it once to get audiobooks out.
If you don't mind doing some development work, needing online connectivity, and paying for usage, AWS's Polly has some very good sounding TTS voices: https://aws.amazon.com/polly/