LLMs up to 4x Faster With Latest NVIDIA Drivers on Windows
LLMs up to 4x Faster With Latest NVIDIA Drivers on Windows

blogs.nvidia.com
Large Language Models up to 4x Faster on RTX With TensorRT-LLM for Windows | NVIDIA Blog

[ comments | sourced from HackerNews
Why just windows?
Two reasons:
Linux is the preferred choice because it is traditionally more performant than windows regarding these things, meaning windows users get gains Linux users already have.
And Nvidia has a different codebase for Linux, so it might take time to get some of those efficiency gains over to Linux.
Additionally as a little aside there's no way Apple is ever dealing with NVIDIA again, and they might dump AMD all together in the coming years.