Phind is now using a V7 of their model for their own platform, as they have found that people overall prefer that output vs GPT4. This is extremely impressive because it's not just a random benchmark that can be gamed, but instead crowd sourced opinion on real tasks
The one place everything still lags behind GPT4 is question comprehension, but this is a huge accomplishment
this is one of the most plausible claims to date because it is supported by anecdotal data from actual use scenarios rather than only benchmark games. puppet hockey