Skip Navigation

Technology @lemmygrad.ml

☆ Yσɠƚԋσʂ ☆ @lemmygrad.ml

7d ago

DeepSeek-V3 now runs at 20 tokens per second on Mac Studio, and that’s a nightmare for OpenAI

venturebeat.com

DeepSeek-V3 now runs at 20 tokens per second on Mac Studio, and that’s a nightmare for OpenAI

Technology @beehaw.org

misk @sopuli.xyz

6d ago

DeepSeek-V3 now runs at 20 tokens per second on Mac Studio, and that’s a nightmare for OpenAI

venturebeat.com /ai/deepseek-v3-now-runs-at-20-tokens-per-second-on-mac-studio-and-thats-a-nightmare-for-openai/

Technology @lemmy.ml

☆ Yσɠƚԋσʂ ☆ @lemmy.ml

7d ago

DeepSeek-V3 now runs at 20 tokens per second on Mac Studio, and that’s a nightmare for OpenAI

venturebeat.com /ai/deepseek-v3-now-runs-at-20-tokens-per-second-on-mac-studio-and-thats-a-nightmare-for-openai/

15 comments

The Mac Studio they talk about running the 685b model on costs 12 000 dollaridoos (after 20% taxes here). I get that it's a consumer device and that it draws less power, but at that point you'd just get a server for less. The power consumption is an outlier since it has everything on chip
- I think the key part is that you can run these large scale models cheaply in terms of energy cost. The price of hardware will inevitably come down going forward, but now we know that there is no fundamental blocker for running models efficiently.
  
  I generally agree, but given how niche a powerful SoC like this is, I doubt it matters right now (<5 years). I understand it proves a point, but I wager there's still a long-ish way to see power-efficient hardware like this available for cheaper (which will most likely come from China natively)
  
  Yeah, a 5 year or so timeline before we see SoC design becomes dominant is a good guess. There are other interesting ideas like analog chips that have potential to drastically cut power usage for neural networks as well. Next few years will be interesting to watch.
- But imagine what you'll be able to run it on in four more months. But yeah, it's stretching the definition of consumer hardware a bit.
  
  You can use the smaller models on (beefy) consumer hardware already. That's something, right? 😅
  
  I want the full 1TB model running on my 10 year old linux laptop
  
  Just put your persistent memory as swap. Easy

15 comments