One thing that this article misses is that multi-threaded executors can very well optimize for latency.
If for some reason a task is a bit slow to run (say parsing a large JSON blob coming from a request), this means that the other request handlers can still run and don't need to add the latency of this large request.