my friend is doing gig work to make chatgpt better at just boring old textbook physics problems, and it's complete dogshit at it. so uh, sure man. nice you got there.
Didn't they manage to make it somewhat good at solving certain math competition problems? Regardless it's a pretty big jump from that to making a breakthrough in physics.
maybe certain ones, but it's generally bad about numbers and mathematical reasoning. he also gets paid to make it fail at math, and it's arguably worse at basic math than physics.
Yeah deepmind had good results with IMO problems, but only geometry problems. They scored almost at the level of gold medalist. That's only a fraction of IMO problems, though. They did it by combining a formal verification system with a LLM to propose solution paths, and then doing some tree search I think.
This is one way to improve large AI systems and will probably be incorporated in some way in the future, for example by integrating with a language like lean (for math proofs).
They will also be improved by combining with tool use like calculators, code interpreters, web search, calendars, etc. This is already starting to happen to some extent.
LLMs by themselves, at least with current architectures using transformers, are not great at reasoning (counting, arithmetic, symbolic reasoning)