How well can LLMs solve chess puzzles?
How well can LLMs solve chess puzzles?
Benchmark LLM reasoning capability by solving chess puzzles. - kagisearch/llm-chess-puzzles
Each LLM is given the same 1000 chess puzzles to solve. See puzzles.csv
. Benchmarked on Mar 25, 2024.
Model | Solved | Solved % | Illegal Moves | Illegal Moves % | Adjusted Elo |
---|---|---|---|---|---|
gpt-4-turbo-preview | 229 | 22.9% | 163 | 16.3% | 1144 |
gpt-4 | 195 | 19.5% | 183 | 18.3% | 1047 |
claude-3-opus-20240229 | 72 | 7.2% | 464 | 46.4% | 521 |
claude-3-haiku-20240307 | 38 | 3.8% | 590 | 59.0% | 363 |
claude-3-sonnet-20240229 | 23 | 2.3% | 663 | 66.3% | 286 |
gpt-3.5-turbo | 23 | 2.3% | 683 | 68.3% | 269 |
claude-instant-1.2 | 10 | 1.0% | 707 | 66.3% | 245 |
mistral-large-latest | 4 | 0.4% | 813 | 81.3% | 149 |
mixtral-8x7b | 9 | 0.9% | 832 | 83.2% | 136 |
gemini-1.5-pro-latest* | FAIL | - | - | - | - |
Published by the CEO of Kagi!
You're viewing a single thread.
I wonder why gpt-4 is so good at chess.
1 0 ReplyIf I tried to make an illegal move 20% of the time, would you also say I am good at chess?
3 0 ReplyDepends on circumstances, obviously.
2 0 ReplyOkay. What if the circumstance is because I'm just recalling a bunch of chess puzzle solutions I've seen before and regurgitating the one I think is the correct solution for this particular pizzle without really understanding the rules of chess?
1 0 ReplyThat's another thing I'm wondering about, but so is anyone. I'd still want to know why GPT-4 does so much better than the others.
1 0 Reply