1-bit LLM performs similarly to full-precision Transformer LLMs with the same model size and training tokens but is much more efficient in terms of latency, memory, throughput, and energy consumption.
You're viewing a single thread.
Why use lot bit when one bit do trick?
Bits together weak