Does Bing Chat give reliable answers to math and physics questions? If not is it possible to make it more reliable?
I realize and understand the criticisms of ChatGPT and I have personally seem how bad it can be. Once I asked to count the number of days till a random date giving the present date and it failed miserably, again and again. Trust me! I get the criticism. But, what about Bing Chat Bot?
Have you ever tried to ask you Physics and Maths related questions to it? I was coding a while ago and I had a pretty complex questions which could not be solved by a very popular reddit coding community but Bing Chatbot gave an answer to it in an instant! I was genuinely impressed. Apparently it checks for answers on multiple webpages on the internet, it reads and understands what it reads and it gives the answer to it after combining the knowledge it gained from it's search. Again, the question I asked was pretty complex but it was able to answer it in an instant and it was the right answer! It was coding, it's pretty hard to get the right answer in the first try, I have found it's more "trial and error".
So yeah!
Can I rely partially on Bing Chatbot for math questions?
If not can I ask it to form a query which encapsulates my question perfectly?
If not, should I ask it to "Answer this question and site your sources"?
Can I do something more? i.e., like I did in 3? What are your thoughts on this?
I won't be able to reply to each of your comments anytime soon, but know that I deeply appreciate this community and it's members and their help :')
These models aren't great at tasks that require precision and analytical thinking. They're trained on a fairly simple task, "if I give you some text, guess what the next bit of text is." Sounds simple, but it's incredibly powerful. Imagine if you could correctly guess the next bit of text for the sentence "The answer to the ultimate question of life, the universe, and everything is" or "The solution to the problems in the Middle East is".
Recently, we've been seeing shockingly good results from models that do this task. They can synthesize unrelated subjects, and hold coherent conversations that sound very human. However, despite doing some things that up until recently only humans could do, they still aren't at human-level intelligence. Humans read and write by taking in words, converting them into rich mental concepts, applying thoughts, feelings, and reasoning to them, and then converting the resulting concepts back into words to communicate with others. LLMs arguably might be doing some of this too, but they're evaluated solely on words and therefore much more of their "thought process" is based on "what words are likely to come next" and not "is this concept being applied correctly" or "is this factual information". Humans have much, much greater capacity than these models, and we live complex lives that act as an incredibly comprehensive training process. These models are small and trained very narrowly in comparison. Their excellent mimicry gives the illusion of a similarly rich inner life, but it's mostly imitation.
All that comes down to the fact that these models aren't great at complex reasoning and precise details. They're just not trained for it. They got through "life" by picking plausible words and that's mostly what they'll continue to do. For writing a novel or poem, that's good enough, but math and physics are more rigorous than that. They do seem to be able to handle code snippets now, mostly, which is progress, but in general this isn't something that you can be completely confident in them doing correctly. They make silly mistakes because they aren't really thinking it through. To them, there isn't really much difference between answers like "that date is 7 days after Christmas" and "that date is 12 days after Christmas." Which one it thinks is more correct is based on things it has seen, not necessarily an explicit counting process. You can also see this in things like that case where someone tried to use it to write a legal brief, where it came up with citations that seemed plausible but were in fact completely made up. It wasn't trained on accurate citations, it was trained on words.
They also have a bad habit of sounding confident no matter what they're saying, which makes it hard to use them for things you can't check yourself. Anything they say could be right/accurate/good/not plagiarized, but the model won't have a good sense of that, and if you don't know either, you're opening yourself up to risk of being misled.
They can definitely be made to work out arithmetic and similar though
If you were to say in the preprompt something like:
When asked a mathematical question, please respond with the equations used to achieve the result
For example if you asked it what 3x4 is it could respond with "The answer is {3x4}" and then the {3x4} could be evaluated in software afterwards and dropped in for the user to see
I think that might be what chatGPT does now as they somewhat recently fixed it always getting maths wrong
Or alternatively you could ask it to simply write a script to work out whatever problem it's given that isn't linguistic and execute that in a sandboxed environment (though still might be too risky incase it generates some bad code)