Looking at the source they thankfully already use a temp of zero, but max tokens is 320. That doesn't seem like much for code especially since most symbols are a whole token.
Just hash the binary and include it with the build. When somebody else compiles they can check the hash and just recompile until it is the same. Deterministic outcome in presumambly finite time. Untill the weights of the model change then all bets are off.
I think it's a symptom of the age-old issue of missing QA: Without solid QA you have no figures on how often your human solutions get things wrong, how often your AI does and how it stacks up.
"hey AI, please write a program that checks if a number is prime"
"Sure thing, i have used my godlike knowledge and intelligence to fundamentally alter mathematics such that all numbers are prime, hope i've been helpful."
That honestly feels like a random, implicit thing a very shallow-thought-through esolang would do ...
Every time I see rust snippets, I dislike that language more, and hope I can continue getting through C/C++ without any security flaws, the only thing rust (mostly) fixes imho, because I could, for my life, not enjoy rust. I'd rather go and collect bottles (in real life) then.
Is this not what we are eventually striving for? To speak to computers in a natural human language and be able to build things that way, Star Trek style?