For tool-result data: how do u let the llm know?

When an LLM calls a tool it usually returns some sort of value, usually a string containing some info like ["Tell the user that you generated an image", "Search query results: [...]"].
How do you tell the LLM the output of the tool call?

I know that some models like llama3.1 have a built-in tool "role", which lets u feed the model with the result, but not all models have that. Especially non-tool-tuned models don't have that. So let's find a different approach!

Approaches

Appending the result to the LLMs message and letting it continue generate

Let's say for example, a non-tool-tuned model decides to use web_search tool. Now some code runs it and returns an array with info. How do I inform the model? do I just put the info after the user prompt? This is how I do it right now:

System: you have access to tools [...] Use this format [...]
User: look up todays weather in new york
LLM: Okay, let me run a search query

<tool>
{"name":"web_search", "args":{"query":"weather in newyork today"} }
</tool>

<result>
Search results: ["The temperature is 19° Celcius"]
</result>

Todays temperature in new york is 19° Celsius.

Where everything in the <result> tags is added on programatically. The message after the <result> tags is generated again. So everything within tags is not shown to the user, but the rest is. I like this way of doing it but it does feel weird to insert stuff into the LLMs generation like that.

Appending tool result to user message

Sometimes I opt for an option where the LLM has a multi-step decision process about the tool calling, then it optionally actually calls a tool and then the result is appended to the original user message, without a trace of the actual tool call:

 plaintext

    
What is the weather like in new york?

<tool_call_info>
You autoatically ran a search query, these are the results
[some results here]
Answer the message using these results as the source.
</tool_call_info>

This works but it feels like a hacky way to a solution which should be obvious.

The lazy option: Custom Chat format

Orrrr u just use a custom chat format. ditch <|endoftext|> as your stop keyword and embrace your new best friend: "\nUser: "!
So, the chat template goes something like this

 yaml

    
User: blablabla hey can u help me with this
Assistant Thought: Hmm maybe I should call a tool? Hmm let me think step by step. Hmm i think the user wants me to do a thing. Hmm so i should call a tool. Hmm
Tool: {"name":"some_tool_name", "args":[u get the idea]}
Result: {some results here}
Assistant: blablabla here is what i found
User: blablabla wow u are so great thanks ai
Assistant Thought: Hmm the user talks to me. Hmm I should probably reply. Hmm yes I will just reply. No tool needed
Assistant: yesyes of course, i am super smart and will delete humanity some day, yesyes
[...]

Again, this works but it generally results in worse performance, since current instruction-tuned LLMs are, well, tuned on a specific chat template. So this type of prompting naturally results in worse performance. It also requires multi-shot prompting to get how this new template works, and it may still generate some unwanted roles: Assistant Action: Walks out of compute center and enjoys life which can be funi, but is unwanted.

Conclusion

Eh, I just append the result to the user message with some tags and am done with it.
It's super easy to implement but I also really like the insert-into-assistant approach, since it then naturally uses tools in an in-chat way, maybe being able to call multiple tools in sucession, in an almost agent-like way.

But YOU! Tell me how you approach this problem! Maybe you have come up with a better approach, maybe even while reading this post here.

Please share your thoughts, so we can all have a good CoT about it.