For tool-result data: how do u let the llm know?
For tool-result data: how do u let the llm know?
When an LLM calls a tool it usually returns some sort of value, usually a string containing some info like ["Tell the user that you generated an image", "Search query results: [...]"]
.
How do you tell the LLM the output of the tool call?
I know that some models like llama3.1
have a built-in tool "role", which lets u feed the model with the result, but not all models have that. Especially non-tool-tuned models don't have that. So let's find a different approach!
Approaches
Appending the result to the LLMs message and letting it continue generate
Let's say for example, a non-tool-tuned model decides to use web_search tool. Now some code runs it and returns an array with info. How do I inform the model? do I just put the info after the user prompt? This is how I do it right now:
- System: you have access to tools [...] Use this format [...]
- User: look up todays weather in new york
- LLM: Okay, let me run a search query
<tool>
{"name":"web_search", "args":{"query":"weather in newyork today"} }
</tool>
<result>
Search results: ["The temperature is 19° Celcius"]
</result>
Todays temperature in new york is 19° Celsius.
Where everything in the <result>
tags is added on programatically. The message after the <result>
tags is generated again. So everything within tags is not shown to the user, but the rest is. I like this way of doing it but it does feel weird to insert stuff into the LLMs generation like that.
Appending tool result to user message
Sometimes I opt for an option where the LLM has a multi-step decision process about the tool calling, then it optionally actually calls a tool and then the result is appended to the original user message, without a trace of the actual tool call:
plaintext
What is the weather like in new york? <tool_call_info> You autoatically ran a search query, these are the results [some results here] Answer the message using these results as the source. </tool_call_info>
This works but it feels like a hacky way to a solution which should be obvious.
The lazy option: Custom Chat format
Orrrr u just use a custom chat format. ditch <|endoftext|>
as your stop keyword and embrace your new best friend: "\nUser: "
!
So, the chat template goes something like this
yaml
User: blablabla hey can u help me with this Assistant Thought: Hmm maybe I should call a tool? Hmm let me think step by step. Hmm i think the user wants me to do a thing. Hmm so i should call a tool. Hmm Tool: {"name":"some_tool_name", "args":[u get the idea]} Result: {some results here} Assistant: blablabla here is what i found User: blablabla wow u are so great thanks ai Assistant Thought: Hmm the user talks to me. Hmm I should probably reply. Hmm yes I will just reply. No tool needed Assistant: yesyes of course, i am super smart and will delete humanity some day, yesyes [...]
Again, this works but it generally results in worse performance, since current instruction-tuned LLMs are, well, tuned on a specific chat template. So this type of prompting naturally results in worse performance. It also requires multi-shot prompting to get how this new template works, and it may still generate some unwanted roles: Assistant Action: Walks out of compute center and enjoys life
which can be funi, but is unwanted.
Conclusion
Eh, I just append the result to the user message with some tags and am done with it.
It's super easy to implement but I also really like the insert-into-assistant approach, since it then naturally uses tools in an in-chat way, maybe being able to call multiple tools in sucession, in an almost agent-like way.
But YOU! Tell me how you approach this problem! Maybe you have come up with a better approach, maybe even while reading this post here.
Please share your thoughts, so we can all have a good CoT about it.