Large Language Models up to 4x Faster on RTX With TensorRT-LLM for Windows
Large Language Models up to 4x Faster on RTX With TensorRT-LLM for Windows
blogs.nvidia.com Large Language Models up to 4x Faster on RTX With TensorRT-LLM for Windows | NVIDIA Blog
Generative AI on PC is getting up to 4x faster via TensorRT-LLM for Windows, an open-source library that accelerates inference performance.
cross-posted from: https://lemdro.id/post/2377716 (!aistuff@lemdro.id)
2
crossposts
1
comments
Dang I need to try these for now only the stable diffusion extension for automatic 1111 is available.
I wonder if it will accelerate 30b models that doesn’t fit all in the gpu vram.
If it only accelerates 13b then it was already fast enough
3 0 Reply